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ABSTRACT 

This paper describes ZEUS-MP, a multi- physics, massively paraUel, message-passing implementation 
of the ZEUS code. ZEUS-MP differs significantly from the thoroughly documented ZEUS-2D code, the 
completely undocumented (in peer-reviewed literature) ZEUS-3D code, and a marginally documented 
"version 1" of ZEUS-MP first distributed publicly in 1999. ZEUS-MP offers an MHD algorithm which 
is better suited for multidimensional flows than the ZEUS-2D module by virtue of modifications to 
the Method of Characteristics scheme first suggested by Hawley & Stone (1995). This MHD module 
is shown to compare quite favorably to the TVD scheme described by Ryu et al. (1998). ZEUS-MP is 
the first publicly-available ZEUS code to allow the advection of multiple chemical (or nuclear) species. 
Radiation hydrodynamic simulations are enabled via an implicit flux-limited radiation diffusion (FLD) 
module. The hydrodynamic, MHD, and FLD modules may be used, singly or in concert, in one, two, 
or three space dimensions. Additionally, so-called "1.5-D" and "2.5-D" grids, in which the "half-D" 
denotes a symmetry axis along which a constant but non-zero value of velocity or magnetic field 
is evolved, are supported. Self gravity may be included either through the assumption of a GM/r 
potential or a solution of Poisson's equation using one of three linear solver packages (conjugate- 
gradient, multigrid, and FFT) provided for that purpose. Point-mass potentials are also supported. 

Because ZEUS-MP is designed for large simulations on parallel computing platforms, considerable 
attention is paid to the parallel performance characteristics of each module in the code. Strong-scaling 
tests involving pure hydrodynamics (with and without self-gravity), MHD, and RHD are performed in 
which large problems (256^ zones) are distributed among as many as 1024 processors of an IBM SP3. 
Parallel efficiency is a strong function of the amount of communication required between processors 
in a given algorithm, but all modules are shown to scale well on up to 1024 processors for the chosen 
fixed problem size. 

Subject headings: hydrodynamics - methods:numerical - methods:parallel - MHD - radiative transfer 



1. INTRODUCTION 

Since their formal introduction in the literature, the 
ZEUS simulation codes have enjoyed widespread use in 
the numerical astrophysics community, having been ap- 
plied to such topics as planetary nebulae (Garci'a-Segura 
et al. 1999), molecular cloud turbulence (Mac Low 1999), 
solar magnetic arcades (Low & Manchester 2000), and 
galactic spiral arm formation (Martos et al. 2004a, b). 
The numerical methods used in the axisymmetric ZEUS- 
2D code are documented in an often-cited trio of pa- 
pers (Stone & Norman 1992a,b; Stone et al. 1992b) 
well familiar to the computational astrophysics commu- 
nity. A reasonable first question to ask regarding this 
report might therefore be, "why write another ZEUS 
'method' paper?" The first reason is that the code we 
describe in this paper, ZEUS-MP, is a significantly dif- 
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ferent code when compared to the highly-documented 
ZEUS-2D code, the completely undocumented (in peer- 
reviewed literature) ZEUS-3D code, and a marginally 
documented "version 1" of ZEUS-MP made publicly 
available in 1999. The new version of ZEUS-MP we 
present is the first ZEUS code to unite 3D hydrodynam- 
ics (HD) and 3D magnetohydrodynamics (MHD) with 
implicit flux-limited radiation diffusion (FLD) and self- 
gravity in a software framework designed for execution 
on massively parallel architectures. This last feature an- 
ticipates a second major reason for offering a new method 
paper: the computing hardware landscape in which nu- 
merical astrophysicists operate has changed enormously 
since the ZEUS-2D trilogy was published. The enormous 
increase in computer processor speed and available mem- 
ory has brought with it a new paradigm for computing 
in which large simulations are distributed across many 
(often hundreds to thousands) of parallel processors; this 
new environment has spawned additional figures of merit, 
such as parallel scalability, by which modern numerical 
algorithms must be judged. A major component of this 
paper is the demonstration of the suitability of ZEUS- 
MP 's algorithms, both new and familiar, for parallel ex- 
ecution on thousands of processors. 

In addition to describing the many new features and 
capabilities provided in ZEUS-MP, this paper fills sig- 
nificant gaps in the evolution history of the MHD and 
radiation modules offered in predecessors of ZEUS-MP. 
These gaps were partially a consequence of the evolu- 
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tion of the ZEUS series. The first code formally named 
ZEUS was developed by David Clarke (Clarke 1988; 
Clarke et al. 1986) for MHD simulations of radio jets. 
Thereafter, continued development of the ZEUS method 
proceeded along two parallel tracks. One track resulted 
in the release of the ZEUS-2D code, which solves the 
equations of self-gravitating radiation magnetohydrody- 
namics in two or 2.5 dimensions. ("2.5-D" denotes a 
problem computed in 2 spatial dimensions involving the 
3-component of a vector quantity, such as velocity, that is 
invariant along the 3-axis but variable along the 1- and 
2-axcs.) The creation of ZEUS-2D occasioned the de- 
velopment and incorporation of several new algorithms, 
including (1) a covariant formulation, allowing simula- 
tions in various coordinate geometries; (2) a tensor arti- 
ficial viscosity; (3) a new, more accurate MHD algorithm 
(MOCCT) combining the Constrained Transport algo- 
rithm (Evans & Hawley 1988) with a Method Of Char- 
acteristics treatment for Alfven waves; and (4) a variable 
tensor Eddington factor (VTEF) solution for the equa- 
tions of radiation hydrodynamics (RHD). The VTEF ra- 
diation module was described in Stone et al. (1992b) but 
not included in the version of ZEUS-2D offered for pub- 
lic distribution. An implicit FLD-based RHD module 
was publicly distributed with ZEUS-2D but never doc- 
umented in a published report. (A draft of a technical 
report describing the 2-D FLD module is available on 
the World Wide Web^.) The VTEF module described 
in Stone et al. (1992b) was later modified to incorporate 
a different matter-radiation coupling scheme, Eddington 
tensors computed from a time-dependent solution to the 
radiation transfer equation, and parallelization over two 
space dimensions and one angle cosine. This new VTEF 
code was coupled to an early version of ZEUS-MP and 
used to compare VTEF solutions to FLD results for 
test problems featuring strongly beamed radiation fields. 
This work was published by Hayes & Norman (2003), but 
as before the VTEF module remained reserved for pri- 
vate use. ZEUS-MP as described herein provides an up- 
dated FLD module designed for all dimensionalities and 
geometries in a parallel computing environment. This 
paper documents our new module at a level of detail 
which should aid users of both ZEUS-MP and the pub- 
lic ZEUS-2D code. 

The second track resulted in the release of ZEUS- 
3D, the first ZEUS code capable of three-dimensional 
simulations. Written for the Cray-2 supercomputer by 
David Clarke, ZEUS-3D physics options included hydro- 
dynamics, MHD, self-gravity, and optically thin radia- 
tive cooling. ZEUS-3D was the first ZEUS code with 
parallel capability, accomplished using Cray Autotask- 
ing compiler directives. The ZEUS-3D MHD module 
differed from that described in the ZEUS-2D papers 
with regard to both dimensionality and method: the 
MOC treatment of Alfven waves was modified to in- 
corporate the improvements introduced by John Hawley 
and James Stone (Hawley & Stone 1995). This modified 
"HSMOCCT" method is the basis for the MHD module 
adopted in ZEUS-MP. 

Roughly speaking, ZEUS-MP encapsulates a qualita- 
tive union of the methods provided by the public ZEUS- 
2D and ZEUS-3D codes and enlarges upon them with 

* http://cosmos.ucsd.edu/lca-publications/LCA013/ index.html 



new capabilities and solution techniques. The first public 

release of ZEUS-MP included HD, MHD, and self gravity, 
but was written exclusively for 3-D simulations, exclud- 
ing at one stroke a long menu of worthy 2-D research 
applications and erecting an inconvenient barrier to ex- 
pedient code testing. The new version we describe offers 
a substantially extended menu of physics, algorithm, and 
dimensionality options. The HD, MHD, and RHD mod- 
ules accommodate simulations in 1, 1.5, 2, 2.5, and 3 
dimensions. Arbitrary equations of state are supported; 
gamma-law and isothermal equations of state are pro- 
vided. ZEUS-MP is the first ZEUS code to allow muhi- 
species fluids to be treated; this is achieved with the addi- 
tion of a concentration field array dimensioned for an ar- 
bitrary number of fluids. An implicit flux-limited photon 
diffusion module is included for RHD problems. As noted 
previously, the implicit FLD solution is based upon the 
method adopted for the public version of ZEUS-2D but 
has been extended to three dimensions. In addition, the 
FLD module sits atop a scalable linear system solver us- 
ing the conjugate gradient (CG) method (Barret et al. 
1994). Supplementing the FLD driver is an opacity mod- 
ule offering simple density and temperature-dependent 
power-law expressions for absorption and scattering co- 
efficients; additional user-supplied modules (such as tab- 
ular data sets) are easily accommodated. Self-gravity is 
included in several ways: (1) spherical gravity (GM/r) is 
adopted for one-dimensional problems and may also be 
used in two dimensions; (2) two parallel Poisson solvers 
are included for problems with Neumann or Dirichlet 
boundary conditions in Cartesian or curvilinear coordi- 
nates, and (3) a fast Fourier Transform (FFTw) package 
is provided for problems with triply-periodic boundaries. 
In addition to self-gravity, a simple point-mass external 
potential may be imposed in spherical geometry. 

While the ZEUS code line has evolved significantly, this 
process has not occurred in a vacuum. Indeed, the past 
decade has seen the emergence of several new MHD codes 
based upon Godunov methods (Ryu et al. 1998; Londrillo 
& Del Zanna 2000; Londrillo & del Zanna 2004; Balsara 
2004; Gardiner & Stone 2005). Godunov-type schemes 
are accurate to second order in both space and time, are 
automatically conservative, and possess superior capabil- 
ity for resolving shock fronts when compared to ZEUS- 
type schemes at identical resolution. These advances, 
coupled with ZEUS's lower-order rate of convergence 
with numerical resolution (Stone et al. 1992a), and fur- 
ther with recent observations (Falle 2002) of vulnerabil- 
ities in the ZEUS-2D implementation of MOCCT might 
lead one to ask whether a new ZEUS code has a strong 
potential for future contributions of significance in areas 
of current astrophysical interest. We argue this point 
in the affirmative on two fronts. While we acknowledge 
that the ZEUS method possesses (as do all numerical 
methods) weaknesses which bound its scope of applicabil- 
ity, we note the encouraging comparisons (Stone & Gar- 
diner 2005) of simulation results computed with Athena, 
an unsplit MHD code coupling PPM hydrodynamics to 
Constrained Transport, to ZEUS results in problems of 
supersonic MHD turbulence in 3D, and in shearing-box 
simulations of the magneto-rotational instability in two 
and three dimensions. The authors confirm the relia- 
bility of the ZEUS results and note that the numerical 
dissipation in Athena is equivalent to that of ZEUS at a 



3 



grid resolution of about 1.5 along each axis. In a sim- 
ilar vein, wo examine a standard 2-D MHD tost prob- 
lem, due to Orszag & Tang (1979), in which the results 
from ZEUS-MP are found to compare quite favorably 
with those from the TVD code described in Ryu et al. 
(1998) and with results from different upwind Godunov 
codes presented by Dai & Woodward (1998) and Lon- 
drillo & Del Zanna (2000). 

A second cause for optimism of ZEUS's future arises 
from the versatility inherent in ZEUS-MP 's design. As 
this paper demonstrates, a wide variety of physics mod- 
ules are easily implemented within ZEUS-MP's design 
framework. Additionally, different solution techniques 
for treating the same physics (self-gravity is an excellent 
example) are almost trivially accommodated. The theme 
of this paper is, therefore: physics, flexibility, and paral- 
lel performance. To demonstrate these traits we organize 
the paper as follows: the presentation begins by writing 
the fluid equations solved by ZEUS-MP in section 2; sec- 
tion 3 provides a largely descriptive overview of the var- 
ious numerical methods used to solve these equations on 
a discrete mesh. Sections 4 and 5 present two groups of 
test problems. Section 4 provides a suite of tests to verify 
the correctness of the various physics modules. Section 5 
examines a quartet of 3-D problems which measure per- 
formance on a massively parallel computer. 

The main body of the paper is concluded with a sum- 
mary in section 6. Full documentation of the finite- 
difference formulae describing hydrodynamic, magneto- 
hydrodynamic, and radiation hydrodynamic evolution in 
three dimensions is provided in a series of appendices. 
These appendices arc preceded by a tabular "code map" 
(Table Al) in appendix A associating the discrete equa- 
tions written in appendices B-E with the subroutines 
in ZEUS-MP that compute them. We offer this as an 
aid to users wishing to familiarize themselves with the 
code and to code developers who desire to implement 
their own versions or improve upon what is currently 
offered in ZEUS-MP. The HD, MHD, and FLD mod- 
ules are detailed in appendices B, C, and D, respectively. 
Appendix E documents the 3-D linearized form of Pois- 
son's equation expressed in covariant grid coordinates, 
and details of our parallel implementation techniques and 
strategies are given in appendix F. 

2. THE PHYSICAL EQUATIONS 

Our description of the physical state of a fluid element 
is specified by the following set of fiuid equations relating 
the mass density (p), velocity (v), gas internal energy 
density (e), radiation energy density (E), radiation fiux 
(F), and magnetic field strength (B): 
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The Lagrangean (or comoving) derivative is given by 
the usual definition: 
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The four terms on the RHS of the gas momentum equa- 
tion (2) denote forces to due thermal pressure gradients, 
radiation stress, magnetic Lorentz acceleration, and the 
gravitational potential, respectively. The RHS of (3) 
gives source/sink terms due to absorption/emission of 
radiant energy. Each term serves an inverse role on the 
RHS of (4). In (3), Bp denotes the Planck function: 
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where T is the local material temperature. Equations 

(3) and (4) are also functions of flux-mean, Planck-mean, 
and energy-mean opacities, which are formally defined as 
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In the simple problems we discuss in this paper, the three 
opacities are computed from a single expression, differing 
only in that ke and Kp are defined at zone centers and % 
is computed at zone faces. In general, however, indepen- 
dent expressions or data sets for the three opacities are 
trivially accommodated in our discrete representation of 
the RHD equations. 

We compute the radiation flux, F, according to the 
diffusion approximation: 



F = -('^)VE, 



(11) 



where we have introduced a flux-limiter (Ae) to ensure 
a radiation propagation speed that remains bounded by 
the speed of light (c) in transparent media. Attenuation 
of the flux is regulated by the total extinction coefficient, 
X, which in general may include contributions from both 
absorption and scattering processes. 

Equation (4) also includes a term involving the radia- 
tion stress tensor, P. In general, P is not known a pri- 
ori and must be computed from a solution to the radia- 
tive transfer equation. In the VTEF methods described 
in Stone et al. (1992b) and Hayes fc Norman (2003), P is 
written as the product of the (known) radiation energy, 
E, and an (unknown) tensor quantity, f: 

P = fE. (12) 

A solution for f may be derived from a formal solution to 
the radiative transfer equation (Mihalas 1978) or may be 
approximated analytically. For the RHD implementation 
in ZEUS-MP, we follow the latter approach, adopting the 
expression for f given by equation (13) in Turner & Stone 
(2001): 
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where fi = VE/ | VE |, I is the unit tensor, and / is a 
scalar "Eddington factor" expressed as a function of the 
flux limiter, Ae, and E as 

/ = A, + (14) 



ZEUS-MP 



The RHD equations arc accurate only to order unity in 
v/c (Mihalas & Mihalas 1984), consistent with the radi- 
ation modules described in Stone et al. (1992b), Turner 
& Stone (2001), and Hayes & Norman (2003). The as- 
sumption of local thermodynamic equilibrium is reflected 
in our adoption of the Planck function in equations (3) 
and (4) evaluated at the local material temperature; our 
use of the FLD approximation gives rise to equation (11) 
and is discussed further in § 3.3. 

Evolution of the magnetic field (5) is constrained by the 
assumption of ideal MHD. We are therefore, in common 
with previous public ZEUS codes, assuming wave modes 
which involve fluid motions but no charge separation. 
This equation for B also assumes zero resistivity, a valid 
approximation in astrophysical environments where the 
degree of ionization is sufficiently high to ensure a rapid 
collision rate between charged and neutral particles and 
thus strong coupling between the two. There exist astro- 
physical environments where this assumption is expected 
to break down (e.g. the cores of cold, dense molecular 
clouds), in which case an algorithm able to distinguish 
the dynamics of ionized and neutral particles is required. 
Stone (1999) published extensions to the ZEUS MHD 
algorithm to treat nonidcal phenomena such as Ohmic 
dissipation and partially ionized plasmas where the ionic 
and neutral components of the fluid are weakly coupled 
via a coUisional drag term. Incorporation of these algo- 
rithmic extensions into ZEUS-MP is currently left as an 
exercise to the interested user, but may be undertaken 
for a future public release. 

The gravitational potential $ appearing in (2) is com- 
puted from a solution to Poisson's equation: 

= A-KGp. (15) 

Our various techniques for solving (15) are described 
in §3.4; the linear system which arises from discretiz- 
ing (15) on a covariant coordinate mesh is derived in 
appendix E. 

Our fluid equations are closed with an equation of state 
(EOS) expressing the thermal pressure as a function of 
the internal gas energy. The dynamic test problems con- 
sidered in this paper adopt a simple ideal EOS with 7 = 
5/3 except where noted. 

3. NUMERICAL METHODS: AN OVERVIEW 

Figure 1 summarizes the dependencies of ZEUS-MP's 
major physics and I/O modules on underlying software 
libraries. The Message Passing Interface (MPI) software 
library is used to implement parallelism in ZEUS-MP and 
lies at the foundation of the code. This library is avail- 
able on all NSF and DOE supercomputing facilities and 
may be freely downloaded^ for installation on small pri- 
vate clusters. ZEUS-MP's finear system solvers and I/O 
drivers access MPI functions directly and act in service 
of the top layer of physics modules, which are described 
in the following subsections. 

^ http://www-unix.mcs.anl.gov/mpi/mpich/ 
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Fig. 1. — Software implementation within ZEUS-MP. 

3.1. Hydrodynamics 

Equations (1) through (5) provide the most complete 
physical description which may be invoked by ZEUS- 
MP to characterize a problem of interest. There exist 
classes of problems, however, for which either radiation 
or magnetic fields (or both) are not relevant and thus may 
be ignored. In such circumstances, ZEUS-MP evolves an 
appropriately reduced subset of field variables and solves 
only those equations needed to close the system. ZEUS- 
MP may therefore define systems of equations for pure 
HD, MHD, RHD, or RMHD problems as necessary. We 
begin the description of our numerical methods by con- 
sidering purely hydrodynamic problems in this section; 
we introduce additional equations as we expand the dis- 
cussion of physical processes. 

Tracing their ancestry back to a two-dimensional, Eu- 
lerian hydrodynamics (HD) code for simulations of rotat- 
ing protostellar collapse (Norman et al. 1980), all ZEUS 
codes are rooted in an HD algorithm based upon the 
method of finite differences on a staggered mesh (Norman 
1980; Norman & Winkler 1986), which incorporates a 
second order-accurate, monotonic advection scheme (van 
Leer 1977). The basic elements of the ZEUS scheme arise 
from consideration of how the evolution of a fluid ele- 
ment may be properly described on an adaptive mesh 
whose grid lines move with arbitrary velocity, Vg. Fol- 
lowing the analysis in Winkler et al. (1984), we identify 
three pertinent time derivatives in an adaptive coordi- 
nate system: (1) the Eulerian time derivative (d/dt), 
taken with respect to coordinates fixed in the laboratory 
frame, (2) the Lagrangean derivative [D/Dt; cf. equa- 
tion 6), taken with respect to a definite fluid element, 
and (3) the adaptive-mesh derivative {d/dt), taken with 
respect to fixed values of the adaptive mesh coordinates. 
Identifying dV as a vohime element bounded by fixed val- 
ues of the adaptive mesh and dS as the surface bounding 
this element, one may employ the formalism of Winkler 
et al. (1984) to split the fluid equations into two distinct 
solution steps: the source step, in which we solve 
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and the transport step, whence 
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where Vg is the local grid velocity. Equations (16) and 
(17) have been further modified to include an artificial 
viscous prcssiirc, Q. ZEUS-MP employs the method due 
to von Neumann & Richtmycr (1950) to apply viscous 
pressure at shock fronts. This approach is known to pro- 
vide spurious viscous heating in convergent coordinate 
geometries even when no material compression is present. 
Tscharnutcr & Winkler (1979) describe a tensor formal- 
ism for artificial viscosity which avoids this problem; an 
implementation of this method will be provided in a fu- 
ture release of ZEUS-MP. For problems involving very 
strong shocks and stagnated flows, the artificial viscosity 
may be augmented with an additional term which is lin- 
ear in velocity and depends upon the local sound speed. 
The precise forms of the quadratic and linear viscosity 
terms are documented in Appendix B. 

3.2. MHD 

The treatment of MHD waves in ZEUS-MP is by ne- 
cessity more complex than that for HD waves because 
MHD waves fall into two distinct families: (1) longitudi- 
nal, compressive (fast and slow magnetosonic) ; and (2) 
transverse, non- compressive (Alfven) waves. The former 
family may be treated in the source-step portion of the 
ZEUS solution in a similar fashion to their hydrodynamic 
analogs, but the latter wave family couples directly to 
the magnetic induction equation and therefore requires a 
more complex treatment. From the algorithmic perspec- 
tive, the inclusion of MHD into the ZEUS scheme has two 
consequences: (1) fluid accelerations due to compressive 
MHD waves introduce additional terms in equation (16); 
(2) fluid accelerations due to transverse MHD introduce 
additional velocity acceleration terms which, owing to 
the coupling to the induction equation, are computed 
in a separate step which follows the source step update 
but precedes the "transport" (i.e. fluid advection) up- 
date. In this way, the updates due to fluid advection 
are deferred until all updates to the velocities are prop- 
erly recorded. As will be extensively described in this 
section, the fluid accelerations due to transverse MHD 
waves are combined with the evolution equation for B 
because of the tight mathematical coupling between the 
two solutions. 

We guide the following discussion by providing, in the 
continuum limit, the flnal result. With the inclusion of 
MHD, the source/transport solution sequence expands 
to begin with an MHD-augmented "source" step: 
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This is followed by an MOCCT step, whence 



i9v 



dt 



Is 



final 



9v 



source step 



+ -(B.V)B;(23) 



BdS = edl, 



(24) 



where e is the electromotive force (EMF) and is given by 
e = (v- Vg)xB. (25) 
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Fig. 2.— Program control for ZEUS-MP. 

With velocities and B fields fully updated, we then 
proceed to the "transport" step as written in equations 
(18) through (20). 

Figure 2 shows the program control logic used to im- 
plement the solution outlined by equations (21)- (24) and 
the previously-written expressions (18)-(20). Once the 
problem has been initialized, Poisson's equation is solved 
to compute the gravitational potential. In the source 
step, updates due to longitudinal forces (thermal and 
magnetic) , radiation stress, artificial viscosity, energy ex- 
change between the gas and radiation field, and pdV 
work are performed at fixed values of the coordinates. 
Accelerations due to transverse MHD waves are then 
computed and magnetic field components are updated. 
Finally, velocities updated via (21) and (23) are used to 
advect field variables through the moving mesh in the 
transport step. Advection is performed in a series of di- 
rectional sweeps which are cyclically permuted at each 
time step. 

The remainder of this section serves to derive the new 

expressions appearing in (21), (23), and (24), and to doc- 
ument their solution. We begin by first considering the 
Lorentz acceleration term in the gas momentum equation 
(2). Using the vector identity 

(VxB)xB = -V(5V2) + (B-V)B (26) 

we expand the Lorentz acceleration term such that (2) 
becomes (ignoring radiation) 
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Dt 



i-(B.V)B 



= -Vp - V(SV87r) 

-pV$. (27) 

The second term on the RHS of (27) is the gradient 

of the magnetic pressure. This term, which provides the 
contribution from the compressive magnetosonic waves, 
is clearly a longitudinal force and is differenced in space 
and time identically to the thermal pressure term. This 
expression is thus evaluated simultaneously with the 
other contributions to the "source step" portion of the 
momentum equation (equation 21); contributions from 
the magnetic pressure to the discrete representation 
of (21) along each axis are shown in equations (BIO) 
through (B12), with notational definitions provided by 
expressions (B19) through (B30), in appendix B. 

The third term on the RHS of (27) represents mag- 
netic tension in curved field lines and is transverse to the 
gradient of B. This term, which is added to the solu- 
tion sequence as equation (23) couples to the magnetic 
induction equation to produce Alfven waves; the mag- 
netic tension force and the induction equation (5) are 
therefore solved by a single procedure: the Method of 
Characteristics + Constrained Transport (MOCCT). 

ZEUS-MP employs the MOCCT method of Hawley & 
Stone (1995), which is a generalization of the algorithm 
described in Stone & Norman (1992b) to 3D, with some 
slight modifications that improve numerical stability. To 
describe MOCCT, we first derive the moving frame in- 
duction equation. Recall that equation (5) is derived 
from Faraday's law 

'dt 

where E, B and the time derivative are measured in the 
Eulerian frame. The electric field E is specified from 
Ohm's law 

cE = -vxB + J/cr. (29) 

Equation (5) results when we substitute equation 29 into 
equation 28, and let the conductivity a ^ oo. Inte- 
grating equation 28 over a moving surface clement S{t) 
bounded by a moving circuit C{t), the general form of 
Faraday's law is 

|/^B.dS = -c£E'.dl (30) 

where E' is the electric field measured in the moving 
frame. To first order in v/c, E' = E + (vgXB)/c. 
From equation 29, for a perfectly conducting fluid E = 
— vxB/c. Combining these two results and substituting 
into equation 30, we get 



— = -cVxE, 



(28) 



4- f B dS= /(v-v„)xB dl. 
dt Js Jc ^ 



(31) 

Is Jc 
Equation (31) states that the time rate of change of 
the magnetic flux piercing 5* 



B dS 



(32) 



is given by the line integral of the electromotive force 
(EMF) e = (v - Vg)xB along C: 



dt 



•dl. 



(33) 



El(ij,k+11 



E2(ij.k+1) 




Fig. 3. — Centering of magnetic field variables in ZEUS-MP. 

Equation (33), using (32), is equivalent to expression (24) 
appearing in our grand solution outline, and it forms, 
along with equation (23), the target for our MOCCT 
algorithm. Equation (33) is familiar from standard texts 
on electrodynamics, only now S and C are moving with 
respect to the Eulerian frame. If Vg = v, we recover the 
well known flux-freezing result, d(f)s/dt = D(j>s/Dt = 0. 

As discussed in Evans & Hawley (1988); Stone & Nor- 
man (1992b); Hawley & Stone (1995), equation (33) is in 
the form which guarantees divergence-free magnetic field 
transport when finite differenced, provided the EMFs are 
evaluated once and only once per time step. Referring to 
the unit cell shown in Figure 3, we can write the discrete 
form of equation (33) as 



i-t n-t-l 



(/)lf 



At 



'-^'^i,j,k^^'^i,j + e3jj_|_i^fcAa;3jj+i 
■e2 



i,j,k+l 



Ax2i i - e3 



(1)2. 



At 



-clij,k+iAxli 
■ elij^kAxli - 



j,feAa;3ij,fe;(34) 

+ e3i,j+i,feAa;3i,j,fe 
e3i+i,j,fcAa;3i+ij,fc(35) 



At 



eli.o fcAxli + e2. 



i-\-l.j,k 



Ax2. 



- elij+i^kAxli - e2ij^kAx2ij, (36) 

where (/>1, 02, 03 are the face-centered magnetic fluxes 
piercing the cell faces whose unit normals are in the 
ni,n2,n3 directions, respectively, el,e2,e3 are compo- 
nents of the edge-centered EMFs, and Axl, Ax2, Ax3 are 
the coordinate distances of the cell edges. The peculiar 
subscripting of these line elements is made clear in Ap- 
pendix C. It is easy to sec that for any choice of EMFs, 
B"+^ will be divergence- free provided B" is divergence- 
free. By Gauss's theorem, Jy V-BrfV = B • dS = 0, 
where the second equality follows from the fact V-B = 0. 
Analytically, the time derivative of V-B is also zero. Nu- 
merically, 



di 



B dS; 



^ d(l}/dt 

faces=l 



E E -dl = 0. 



(37) 



faces=l edges=l 
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The last equality results from the fact that when sum- 
ming over all the faces and edges of a cell, each EMF 
appears twice with a change of sign, and thus cancel in 
pairs. 

In principle, one could use any method to compute 
the EMF within the CT formalism and still main- 
tain divcrgcncc-frcc fields. In practice, a method must 
be used which stably and accurately propagates both 
MHD wave types: longitudinal, compressive (fast and 
slow magnetosonic) waves, and the transverse, non- 
compressive (Alfven) waves. As noted previously, the 
first wave type is straightforwardly incorporated into the 
treatment of the compressive hydrodynamic waves; the 
real difficulty arises in the treatment of the Alfven waves. 

In ideal MHD, Alfven waves can exhibit discontinuities 
(rotational, transverse) at current sheets. Unlike hydro- 
dynamical shocks, these structures are not dissipative, 
which rules out the use of dissipative numerical algo- 
rithms to model them. In addition. Alfven waves tightly 
couple the evolution equations for the velocity and mag- 
netic field components perpendicular to the direction of 
propagation. This rules out operator operator splitting 
these components. Finally, we need an algorithm that 
can be combined with CT to give both divergence-free 
transport of fields and correct local dynamics. This will 
be achieved if the EMFs used in the CT scheme con- 
tain information about all wave modes, which for sta- 
bility, must be appropriately upwinded. These multiple 
requirements can be met using the Method of Character- 
istics (MOC) to compute the EMFs. The resulting hy- 
brid scheme is MOCCT (Stone & Norman 1992b; Hawley 
& Stone 1995). 

Schematically, the EMFs can be written as (ignoring 
Vg for simplicity) 

eli,j,fe = i'2* ■ i. 63, 



v3. 



b2* 



i.j,k i.j,k 



= w3. 



61* 



1 * 



1 * 



62, 



(38) 
(39) 
(40) 



where the starred quantities represent time-centered val- 
ues for these variables resulting from the solution of 
the characteristic equations at the centers of zone edges 
where the EMFs are located. To simplify, we apply MOC 
to the Alfven waves only, as the longitudinal modes are 
adequately handled in a previous step by finite difference 
methods. 

Because the MOC is applied only to transverse waves, 
we may derive the appropriate differential equations by 
considering the 1-D MHD wave equations for an incom- 
pressible fluid (Stone & Norman 1992b) which reduce to 
dv B,dB d 

= — ^ — -^y^^^)^ (41) 

at p ox ax 

where we have used the divergence-free constraint in 
one dimension (which implies dBx/dx = 0) and the 
non-compressive nature of Alfven waves (which implies 
dvx/dx = 0). 

We can rewrite the coupled equations (41) and (42) 
in characteristic form by multiplying equation (42) by 
and then adding and subtracting them, yielding 



£3 

1,;+!,* 


1 ^ 










i 




-"—5 




i — ^ 




1 


mm 





Fig. 4. — A two dimensional (xl — x2) slice through the unit cell 
in Figure 3 containing the four e3's. The computation of e3j j ^ is 
illustrated in Figure 5. 

The plus sign denotes the characteristic equation along 
the forward facing characteristic C+, while the minus 
sign denotes the characteristic equation along the back- 
ward facing characteristic C~ . The comoving derivative 
used in equation (43) is defined as 



V/Vt = d/dt + T Bx/p^''^)d/dx, 



(44) 



where the minus (plus) sign is taken for the comoving 
derivative along the C^(C~) characteristic. Note that 
the coefficient of the second term in equation (44) is 
just the Alfven velocity in the moving fluid, Vx ± va- 
Physically, equations (43) state that along characteris- 
tics, which are straight lines in spacetime with slopes 
Vx ± VA, the changes in the velocity and magnetic field 
components in each direction are not independent. 

The finite-difference equations used to solve the charac- 
teristic equations (43) can be generated as follows. Con- 
sider the one dimensional spacer-time diagram centered at 
the position of one of twelve edge-centered EMFs where 
we require the values v*,B* (see Figure 5). Extrapolat- 
ing back in time along the characteristics C"*" and C~ 
to time level n defines the "footpoints" . By using up- 
wind van Leer (1977) interpolation, we can compute the 
time-averaged values for these variables in each domain 
of dependence. For both the velocities and the magnetic 
fields the characteristic speed Vx ± va are used to com- 
pute the footpoint values t^^'", B^'"\v^'" , B^'"'. The fi- 
nite difference equations along C+ and C~ become 

« - + {B* - B+n/ipW = 0, (45) 
« - - {B* - Br.")/(pr)V2 ^ 0^ (4g) 

where the subscript i refers to cell i, not the i-th compo- 
nent of the vectors v, B. For simplicity, we set = p^_i 
and p^ = Pi - The two linear equations for the two un- 
knowns and B^ are then solved algebraically. 

For our multidimensional calculations, the character- 
istic equations are solved in a directionally split fashion 
on planes passing through the center of the cell and the 
cell edges where the EMFs arc to be evaluated. To illus- 
trate, consider the calculation of e-'ii,j,k (Eq. 40). The 
plane parallel to the xl — x2 plane passing through the 
four e3's in Figure 3 is shown in Figure 4. Evaluating 
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Fig. 5. — The computation of eSij^k involves tlic solution of two 
1-D characteristic equations for Alfven waves confined to the {xl — 
x2) plane. The space-time diagrams for the solution of 62* , 112* 
due to Alfven waves propagating parallel to the xl axis, and for 
the solution of 61*, fl* due to Alfven waves propagating parallel 
to the x2 axis are displayed. 

e'Sij^k roquircs values for hi* , vl* , 62*, and v2* at the cell 
corner {xli,x2j,x'Sk)- First, as outlined above, one com- 
putes b2*,v2* by solving the characteristic equations for 
Alfven waves propagating in the xl direction (Figure 5). 
When calculating the location of the footpoints, vl and 
the Alfven speed arc averaged to the zone corner. Then, 
the procedure is repeated for bl*,vl* by solving the char- 
acteristic equations for Alfven waves propagating in the 
x2 direction, using v2 and the Alfven speed averaged to 
the zone corner. Once all the e3's are evaluated in this 
way, the analogous procedure is followed for slices con- 
taining el and e2. Only after all the EMFs have been 
evaluated over the entire grid can the induction equation 
equation be updated in a divergence-free fashion. 

Finally, we consider how the fluid momentum is up- 
dated due to the Lorentz Force. The key point is that 
we do not want to throw away the fluid accelerations aris- 
ing from Alfven waves that are implicit in the solution to 
the characteristic equations (45) and (46). For example, 
the acceleration of v3 by the transverse magnetic forces 
is given by 



dV3 



1 



V(3)(BV87r) + — (B- V)S3, (47) 



or in terms of the EMFs: 



Pi,j,k{- 



v3 



n+l 
»J>fc 



At 



' 47r^ ^ Axl 



63 



Ax2 



where 61 and 62 are four-point averages of the magnetic 
field to the spatial location of v3, and the 63* 's are those 

values that enter into the EMF's referred to in the sub- 
scripts (see Figure 3.) Similarly, the magnetic pressure 
calculation is 



Pi,j,k{ 



At 



"47r'^"^*'Aa;3' 



(49) 

The evaluation of the EMF's outlined by equations (38) 
through (40) has been modified according to a prescrip- 
tion due to Hawley & Stone (1995), in which each of the 



two vB product terms is computed from a mix of quanti- 
ties computed directly from the characteristic criuations 
with quantities estimated from simple advection. Full 
details are provided in Appendix C, but the idea may 
illustrated by an appropriate rewrite of (40) for the eval- 
uation of eSij^k- 

e3i,j,fe = 0.5* {vllj^k b2ij^k + vlij^k &2*,, fc) 

- 0.5 * {v2l^^, bli,j,k + ^i,j,k 61*,- fe) , (50) 

where the starred quantities are derived from character- 
istic solutions and the barred quantities arise from up- 
winded averages along the appropriate fluid velocity com- 
ponent. This modification (which engenders the "HS" in 
"HSMOCCT") introduces a measure of diffusivity into 
the propagation of Alfven waves which is not present in 
the MOC scheme described in Stone & Norman (1992b). 
Hawley & Stone (1995) note that this change resulted 
in a more robust algorithm when applied to fully multi- 
dimensional problems characterized by strong magnetic 
discontinuities. 

3.3. Radiation Diffusion 

The inclusion of radiation in the system of fluid equa- 
tions to be solved introduces four changes in the numer- 
ical solution. First, an additional contribution to the 
sotirce-step momentum equation arises from the radia- 
tion flux: 

Second, new source/sink terms appear in the source-step 
gas energy equation: 

de 

— = ckeE - 47rKpBp - pV-v - Vv : Q. (52) 

Third, a new source-step equation is added for the radi- 
ation energy density: 

dE 

— = 47rKpBp - ckeE - V-F - Vv : P; (53) 

and fourth, an additional advection equation for E is 
added to the transport step: 



^ [ EdV = - (f E (v - Vg) • dS. 

Jv JdV 



(54) 



In comparing equations (51) - (53) with cither the pure 
HD equations (16) - (17) or the MHD analogs (21) - (24), 
it is clear that the inclusion of radiation physics may be 
made to the HD or MHD systems of equations with equal 
ease. Similary, the transport step is augmented with a 
solution of (54) in either scenario. 

ZEUS-MP computes the evolution of radiating fluid 
flows through an implicit solution to the coupled gas and 
radiation energy equations (52) and (53). Rather than 
solve the time-dependent radiation momentum equation 
and treat the flux, F, as an additional dependent vari- 
able, we adopt the flux-limited diffusion (FLD) approxi- 
mation as shown in (11). This allows an algebraic substi- 
tution for F in the flux-divcrgcncc term of the source-step 
equation (53) for E and avoids the need for an additional 
advection equation for the flux. The FLD approximation 
is an attractive choice for multidimensional RHD applica- 
tions for which local heating/ cooling approximations are 
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inadequate. With regard to computational expense, FLD 
offers cnormoiis economy relative to exact Boltzmann so- 
lutions for the photon distribution function because the 
dimensionality of the problem is reduced by 2 when the 
angular variation of the radiation field is integrated away. 
Additionally, the mathematical structure of the FLD 
equations makes the solution amenable to parallel imple- 
mentation. Fundamentally, however, the flux-limiter is 
a mathematical construction which interpolates between 
the limiting cases of transparency and extreme opacity 
in a manner that (hopefully) retains sufficient accuracy 
in the more difficult semi-transparent regime. Precisely 
what constitutes "sufficient accuracy" is dictated by the 
needs of the particular application, and the techniques 
for meeting that requirement may likewise depend upon 
the research problem. Levermore & Pomraning (1981) 
(LP) constructed an FLD theory which derived a form 
of Ae widely adopted in astrophysical applications (and 
in this paper). In their work, LP use simple test problems 
to check the accuracy of their FLD solution against ex- 
act transport solutions. In simulations of core-collapse 
supernovac, Licbendorfcr et al. (2004) compared cal- 
culations employing energy-dependent multi-group FLD 
(MGFLD) calculations against those run with an exact 
Boltzmann solver and found alternate forms of the lim- 
iter which better treated the transport through the semi- 
transparent shocked material in the post-bounce envi- 
ronment. These calculations and others have shown that 
FLD or MGFLD techniques can yield solutions that com- 
pare favorably with exact transport, but that a "one size 
fits all" prescription for the flux limiter is not to be ex- 
pected. 

In the context of applications, two other vulnerabilities 
of FLD bear consideration. Hayes & Norman (2003) have 
compared FLD to VTEF solutions in problems character- 
ized by highly anisotropic radiation fields. Because the 
FLD equation for flux is a function of the local gradient 
in E, radiation tends to flow in the direction of radiation 
energy gradients even when such behavior is not physi- 
cally expected, as in cases where radiation from a distant 
source is shadowed by an opaque object. In applications 
where the directional dependence of the radiation is rel- 
evant (Turner et al. (2005) discuss a possible example), 
an FLD prescription may be of limited reliability. A 
second vulnerability concerns numerical resolution. As 
discussed in detail by Mihalas & Mihalas (1984), the dif- 
fusion equation must be flux-limited because numerically, 
the discrete diffusion equation advances a radiation wave 
one mean-free path (A) per time step. Flux limiters are 
designed to act when \/ At exceeds c, which ordinar- 
ily one expects in a transparent medium. A problem 
can arise in extremely opaque media however, when the 
physical mean-free path is much smaller than the width 
of a grid cell. In this case. A is unresolved, and the effec- 
tive propagation length scale is now determined by the 
much larger zone size. Because the signal speed is much 
less than c, the flux-limiter provides no constraint on 
the propagation of radiation. This can lead to unphys- 
ically rapid heating of irradiated boundary layers which 
are spatially unresolved. This problem has long bedev- 
iled terrestrial transport applications; whether this rep- 
resents a liability for a given astrophysical application 
should be carefully assessed by the user. 

We consider now some basic details of our FLD mod- 



ule. In our RHD prescription, matter and radiation may 
exchange energy through absorption and emission pro- 
cesses represented by the right-hand sides of equations 

(3) and (4), and the radiation stress term on the LHS of 

(4) . The radiation energy budget is further influenced by 
spatial transport, which we treat with the diffusion oper- 
ator. The high radiation signal speed in transparent me- 
dia mandates an implicit solution to the radiation energy 
equation. Coupling between the radiation and matter 
is treated via an iterative scheme based upon Newton's 
method. Recently, Turner & Stone (2001) published a 
new FLD module for the ZEUS-2D code. The physical 
assumptions underlying their method are consistent with 
those identified here, but the mathematical treatment for 
solving the coupled system differs from what we describe 
below. 

Our construction of a linear system for the 3-D RHD 
equations begins with expressions for the spatially and 
temporally discretized gas and radiation energy equa- 
tions. Consider the gas and radiation energy densities 
to be defined at discrete points along 3 orthogonal axes 
denoted by i, j, and fc; i.e. c ^i.j,k and E Ej.j fe. 
We approximate the partial time derivative in terms of a 
time-centered difference between two adjacent time lev- 
els, i" and At = - i". We then define two 
functions in eij.k and 



T7" 



■At 



At 



47rKpBp — ckeE" 



1 

i,j,k 



(55) 



A2) 

J n 



_ n+1 
,j,k iy3,k 



-At 



-47rKpBp + CKE^l+l 



-hAipV-v. (56) 

For notational economy, we have confined explicit ref- 
erence to coordinate indices and time level to the gas 

and radiation energy variables. As written above, the 

functions /^^^ and f^'^ji. are identically zero for a consis- 
tent solution for Bij^k and Eij^k- We employ a Newton- 
Raphson iteration scheme to find the roots of (56) and 
(55). To construct a discrete system of equations which 
is linear in both energy variables, we evaluate the E- 
dependent fiux limiter from values of E at the previous 
time level n. The thermal pressure, Planck functions, 
and opacities arc updated at each iteration. The veloci- 
ties are roughly (but not formally) time-centered, having 
been updated with contributions from body forces and 
artificial viscosity prior to the radiation solution (cf. fig- 
ure 2). We may write the linear system to be solved 
as 

J{x)6x = -f{x). (57) 

In (57), X is the vector of gas and radiation energy 
variables: x = (Ei.j,k,Bi.j,k)- Likewise, the solution 
vector Sx is the set of corrections to these variables, 
{6Eij^k,Scij^k)- f represents the vector of discrete func- 
tions {filj^k^fi^lk), and J {x) is the Jacobian, dpjdxK 

As written above, expression (57) represents a matrix 
of size (2N)x(2N), where N is the product of the num- 
bers of mesh points along each coordinate axis. As will 
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be shown in appendix D the corrections, Seij^k, to the 
gas energies may be analytically expressed as functions of 
the radiation energy corrections, SEij^k- This allows the 
solution of a reduced system of size NxN for the vector 
of SEij^k, from which the set of Seij^k are immediately 
obtained. These corrections are used to iteratively up- 
date the trial values of Ci.j fe and E^ j fe. Wc have found 
in a variety of test problems that typically 2 to 10 N-R 
iterations are required for a converged solution at each 
time step. 

Each iteration in the N-R loop involves a linear sys- 
tem which must be solved with matrix algebra. Our 
solution for Poisson's equation also requires a linear sys- 
tem solvcir. We discuss the types of linear solvers imple- 
mented in ZEUS-MP separately in §3.8, with additional 
details provided in appendices D, E, and F. 

3.4. Self Gravity 

ZEUS-MP treats Newtonian gravity at three different 
levels of approximation: (1) point-mass potentials, (2) 
spherically-symmetric gravity (V$ = GM/r^), and (3) 
exact solutions to Poisson's equation (15). The first two 
options are trivial to implement; in this discussion we 
therefore focus on the final option. In three dimensions, 
the discrete Laplacian operator connects a mesh point 
at coordinate (i,j,k) with both adjacent neighbors along 
each axis; thus a finite-differenced form of (15) takes the 
following form: 

ai^ij^k^i + a2^ij-i^k + a3$i_ij,fc + 

If (58) is defined on a Cartesian mesh in which the zone 
spacing along each axis is uniform, then the "a" coef- 
ficients in equation 58 are constant over the problem 
domain. If, in addition to uniform gridding, the prob- 
lem data is characterized by spatial periodicity, then Fast 
Fourier Transform (FFT) algorithms offer a highly effi- 
cient method for solving (58). For this class of problems 
(see Li et al. (2004) for a recent example), ZEUS-MP pro- 
vides an FFT module based upon the publicly avail- 
able "FFTw" software (Frigo & Johnson 2005). While 
FFT-based methods are not in general restricted only 
to periodic problems, we emphasize that the module im- 
plemented in ZEUS-MP is valid only for 3-D problems 
on uniform Cartesian meshes with triply-periodic bound- 
aries. 

For multidimensional problems which do not meet 
all of the validity criteria for the FFTw solver, ZEUS- 
MP provides two additional solution modules. The most 
general of these accommodates 2-D and 3-D grids in 
Cartesian, cylindrical, and spherical geometries and is 
based upon the same CG solver provided for the FLD 
radiation equations. A second module currently writ- 
ten for 3-D Cartesian meshes with Dirichlet or Neumann 
boundary conditions is based upon the multigrid (MG) 
method (cf. §3.8.2). 

When equation (58) is formulated as a function of 
ZEUS covariant grid variables, the matrix elements rep- 
resented by the "a" coefficients take on a much more com- 
plicated functional form than the constant values which 
obtain for a uniform Cartesian mesh. The form of (58) 
written for the general class of 3-D covariant grids is 



documented in appendix E. Details of all three solution 
techniques for Poisson's equation are written in §3.8. 

3.5. Multi-species Advection 

Prior public versions of ZEUS codes have treated the 
gas as a single species fluid. In order to be forward- 
compatible with physical processes such as chemistry, nu- 
clear reactions, or lepton transport, ZEUS-MP offers a 
straightforward mechanism for identifying and tracking 
separate chemical or nuclear species in a multi-species 
fluid mixture. Because ZEUS-MP solves one set of fluid 
equations for the total mass density, p, we have no fa- 
cility for modeling phenomena in which different species 
possess different momentum distributions and thus move 
with respect to one another. Nonetheless, a wide variety 
of astrophysical applications are enabled with a mech- 
anism for quantifying the abundances of separate com- 
ponents in a mixed fluid. Our multi-species treatment 
considers only the physical advection of different species 
across the coordinate mesh; physics modules which com- 
pute the local evolution of species concentrations (such 
as a nuclear burning network) must be provided by the 
user. 

Our implementation of multispecies advection pro- 
ceeds by defining a concentration array, X,i, such that 
pXn is the fractional mass density of species n. The ad- 
vection equations in the ZEUS transport step therefore 
include: 



d_ 
di 



[ (pX„) dV = - cf (pX„) (v - vg) • dS. 

Jv JdV 



(59) 



This construction is evaluated such that the mass fluxes 
used to advect individual species across the mesh lines 
are consistent with those used to advect the other field 
variables defined in the application. Discrete formulae 
for the conservative advection of Xn and the other hy- 
drodynamic field variables are provided in appendix B. 

3.6. Time Step Control 

Maintainence of stability and accuracy in a numerical 
calculation requires proper management of time step evo- 
lution. The general expression regulating the time step 
in ZEUS-MP is written 



■ Ccii 



1 



1 



1 



At; 



+ 



1 



al 



Atf 



+ 



Mi 



At 



-+ 



v3 



1/2 



(60) 



in which Cca is the Courant factor and the At terms are 
squares of the minimum values of the following quanti- 
ties: 



A4=7(7-l).(e/p) / (AX^i„)^ 
Atli = [{vl - vgl) I rfa;la]^ ; 
At 



v1 ■ 



A4 = [(«3- 



vgl) I {g2b dx2a)Y ; 

vg3) I (.g31& .g32& dx3a)]^ 



Ai^,= (i 



61% 62^63^) / 



4p {Axy 



(61) 
(62) 
(63) 

(64) 
(65) 
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Ati 



4 qcon 



dv 
dx 



At 



rd ■ 



ertol 



E 



AE 



max. 
2 



(66) 

(67) 



These values represent, respectively, the local sound- 
crossing time, the local fluid crossing time along each 

coordinate, the local Alfvcn wave crossing time, the lo- 
cal viscous timcscalc, and a radiation timescale deter- 
mined by dividing the value of E returned from the FLD 
solver by the time rate of change in E determined by 
comparing the new E to that from the previous time 
step, ertol is a specified tolerance which retrodictively 
limits the maximum fractional change in E allowed in a 
timestep. AXy^in represents the minimum length of a 3- 
D zone edge, i.e. MIN[dxla, g2b dx2a, gSlb g32b dx3a], 
where each zone length is expressed in terms of the lo- 
cal covariant grid coefficients (cf. appendix B). As ex- 
pressed in (60), dtncw represents a trial value of the new 
time step, which is allowed to exceed the previoiis value 
of the time step by no more than a preset factor; i.e. 
Atfinai = min [Atnew, fac X Atoid]) with "fac" typically 
equaling 1.26. This value allows the time step to increase 
by up to a factor of 10 every 10 cycles. 

3.7. Parallelism 

The most powerful computers available today are par- 
allel systems with hundreds to thousands of processors 
connected into a cluster. While some systems offer a 
shared- memory view to the applications programmer, 
others, such as Beowulf clusters, do not. Thus, to maxi- 
mize portability we have assumed "shared nothing" and 
implemented ZEUS-MP as an SPMD (Single Program, 
Multiple Data) parallel code using the MPI message- 
passing library to accomplish interprocessor communi- 
cation. In this model, parallelism is affected via domain 
decomposition (Foster 1995), in which each CPU stores 
data for and performs operations upon a unique sub- 
block of the problem domain. Because finite-difference 
forms of gradient, divergence, and Laplacian operators 
couple data at multiple mesh points, data must be ex- 
changed between neighboring processors when such op- 
erations are performed along processor data boundaries. 
ZEUS-MP employs "asynchronous" or "non-blocking" 
communication functions which allow interprocessor data 
exchange to proceed simultaneously with computational 
operations. This approach provides the attractive ability 
to hide a large portion of the communication costs and 
thus improve parallel scalability. Details of our method 
for overlapping communication and computation opera- 
tions in ZEUS-MP are provided in appendix F. 

3.8. Linear Solvers 

Our implicit formulation of the RHD equations and our 
solution to Poisson's equation for self gravity require the 
use of an efficient linear system solver. Linear systems 
for a single unknown may involve of order 10^ solution 
variables for a 3-D mesh at low to moderate resolution; 
the number of unknowns in a high-resolution 3-D simula- 
tion can exceed 10®. In this regime, the CPU cost of the 
linear system solution can easily dominate the cost of the 
full hydrodynamic evolution. The choice of solution tech- 
nique, with its associated implementation requirements 



and performance attributes, is therefore critically impor- 
tant. Direct inversion methods such as Gauss-Scidel are 
ruled out owing both to extremely high operation counts 
and a spatially recursive solution which precludes paral- 
lel implementation. As with radiation flux limiters, there 
is no "best" method, practical choices being constrained 
by mathematical factors such as the matrix condition 
number (cf. §3.8.1), coordinate geometry, and boundary 
conditions, along with performance factors such as sen- 
sitivity to problem size and ease of parallel implementa- 
tion. This variation of suitability with problem configu- 
ration motivated us to instrument ZEUS-MP with three 
separate linear solver packages; a preconditioned conju- 
gate gradient (CG) solver, a multigrid (MG) solver, and 
a fast Fourier transform (FFT) solver. We describe each 
of these below. 

3.8.1. The Conjugate Gradient Solver 

The conjugate gradient (CG) method is one example 
of a broad class of non- stationary iterative methods for 
sparse linear systems. A concise description of the the- 
ory of the CG method and a pseudo-code template for a 
numerical CG module is available in Barret et al. (1994). 
While a full discussion of the CG method is beyond the 
scope of this paper, several key elements will aid our dis- 
cussion. The linear systems we wish to solve may be 
written in the form Ax = b, where A is the linear sys- 
tem matrix, x is the unknown solution vector, and & is a 
known RHS vector. An associated quadratic form, f(x), 
may be constructed such that 



fix) = ^x'^Ax 



b X + c, 



(68) 



where c is an arbitrary constant (the "T" superscript de- 
notes a transpose) . One may show algebraically that if A 
is symmetric {A'^ = A) and positive-definite {x^Ax > 
for all non-zero x), then the vector x which satisfies 
Ax = b also satisfies the condition that f{x) is mini- 
mized, i.e. 



fix) 



f/(-) 



-m 



(69) 



The CG method is an iterative technique for finding el- 
ements of X such that (69) is satisfied. A key point to 
consider is that the convergence rate of this approach 
is strongly sensitive to the spectral radius or condition 
number of the matrix, given by the ratio of the largest 
to smallest cigcnvahies of A. For matrices that are 
poorly conditioned, the CG method is applied to "pre- 
conditioned" systems such that 



M-'^Ax = M-'^b, 



(70) 



where the preconditioner, M^^, is chosen such that the 
eigenvalues of {M~^A) span a smaller range in value. 
(Typically, one equates the preconditioner with 
rather than M.) 

Prom (70) it follows at once that the "ideal" precondi- 
tioner for A is simply A~^, which of course is unknown. 
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TABLE 1 

MULTIGRID V-CYCLE ITERATION. 



Keyword 



Operation 



Description 



smooth 

compute 

restrict 

solve 

prolong 

update 



2;, 



' 2h 



Smooth error on fine grid via stationary method 

Compute residual on fine grid 

Transfer residual down to coarse grid 

Obtain coarse grid correction from residual equation 

Transfer coarse grid correction up to fine grid 

Update solution with coarse grid correction 



However, for matrices in which the main diagonal ele- 
ments are much larger in magnitude than the off-diagonal 
elements, a close approximation to A^^ may be con- 
structed by defining a diagonal matrix whose elements 
are given by the reciprocals of the corresponding diago- 
nal elements of A. This technique is known as diagonal 
preconditioning, and we have adopted it in the implemen- 
tation of our CG solver. The property in which the di- 
agonal elements of A strongly exceed (in absolute value) 
the values of the off-diagonal elements is known as di- 
agonal dominance. Diagonal dominance is a prerequisite 
for the profitable application of diagonal preconditioning. 
Diagonal preconditioning is an attractive technique due 
to its trivial calculation, the fact that it poses no barrier 
to parallel implementation, and its fairly common occur- 
rence in linear systems. Nonetheless, sample calculations 
in §5 will demonstrate cases in which diagonal dominance 
breaks down, along with the associated increase in cost 
of the linear solution. 

3.8.2. The Multigrid Solver 

Unlike the conjugate gradient method, multigrid meth- 
ods (Brandt 1977) are based on stationary iterative 
methods. A key feature of multigrid is the use of a hi- 
erarchy of nested coarse grids to dramatically increase 
the rate of convergence. Ideally, multigrid methods are 
fast, capable of numerically solving elliptic PDE's with 
computational cost proportional to the number of un- 
knowns, which is optimal. For example, for a problem 
in three dimensions, multigrid (specifically the full multi- 
grid method, discussed below) requires only 0{k^) oper- 
ations. Compare this to O(fc^logfc) for FFT methods. 
0{k^) for non-preconditioned CG. and approximately 
0{k^-^) for preconditioned CG (Heath 1997). Muhigrid 
has disadvantages as well, however; they are relatively 
difficult to implement correctly, and are very sensitive 
to the underlying PDE and discretization. For example, 
anisotropics in the PDE coefficients or grid spacing, dis- 
continuities in the coefficients, or the presence of advec- 
tion terms, can all play havoc with standard multigrid's 
convergence rate. 

Stationary methods, on which multigrid is based, are 
very simple, but also very slow to converge. For a linear 
system Ax = h, the two main stationary methods are 
Jacobi's method (a^x"^"^ = hi — Ylj^i '^ij^]^) the 
Gauss-Seidel method {auxl^^ = bi — Sj<i (^ij^^j^^ — 
), where subscripts denote matrix and vector 
components, and superscripts denote iterations. While 
stationary methods are very slow to converge to the solu- 
tion (the computational cost for both Jacobi and Gauss- 



Seidel methods is 0{k^ log k) for a k^ elliptic problem in 
31?), they do reduce the high-frequency components of 
the error very quickly; that is, they efficiently "smooth" 
the error. This is the first part of understanding how 
multigrid works. The second part is that a problem with 
a smooth solution on a given grid can be accurately rep- 
resented on a coarser grid. This can be a very useful 
thing to do, because problems on coarser grids can be 
solved faster. 

Multigrid combines these two ideas as follows. First, a 

handful of iterations of a stationary method (frequently 
called a "smoother" in multigrid terminology) is applied 
to the linear system to smooth the error. Next, the resid- 
ual for this smoothed problem is transfered to a coarse 
grid, solved there, and the resulting coarse grid correc- 
tion is used to update the solution on the original ( "fine" ) 
grid. Table 1 shows the main algorithm for the multigrid 
V- cycle iteration, applied to the linear system ChX^ = hh 
associated with a grid with zone spacing h. 

Note that the coarse grid problem (keyword "solve" in 
table 1) is solved recursively. The recursion bottoms out 
when the coarsest grid has a single unknown; or, more 
typically, when the coarse grid is small enough to be 
quickly solved using some other method, such as CG, or 
with a small number of applications of the smoother. 
Also, the multigrid V-cycle can optionally have addi- 
tional applications of the smoother at the end of the it- 
eration. This is helpful to smooth errors introduced in 
the coarse grid correction, or to symmetrize the iteration 
when used as a preconditioner. 

The full multigrid method uses V-cycles in a boot- 
strapping approach, first solving the problem on the the 
coarsest grid, then interpolating the solution up to the 
next-finer grid to use as a starting guess for a V-cycle. 
Ideally, just a single V-cycle at each successively finer 
grid level is required to obtain a solution whose error is 
commensurate with the discretization error. 

Multigrid methods in ZEUS-MP arc provided using 
an external MPI-parallel C++/C/Fortran package called 
MGMPI (Bordner 2002). It includes a suite of Krylov 
subspace methods as well as multigrid solvers. The user 
has flexible control over the multigrid cycling strategy, 
boundary conditions, depth of the multigrid mesh hier- 
archy, choice of multigrid components, and even whether 
to use Fortran or C computational kernels. Paralleliza- 
tion is via MPI using the same domain decomposition 
as ZEUS-MP. Currently there are limitations to grid sizes 
in MGMPI: there must be Af2^ — 1 zones along axes 
bounded by Dirichlet or Neumann boundary conditions, 
and M2^ zones along periodic axes, where L is the num- 
ber of coarse grids used, and M is an arbitrary integer. 
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Fig. 6. — Block-to-slab decomposition of density data before call- 
ing FFTW library and slab-to-block decomposition of gravitational 

potential afterward. 

This restriction is expected to change in future versions 
of MGMPI. 

3.8.3. The Fast Fourier Transform Solver 

As mentioned in §3.4, PFT algorithms offer a highly ef- 
ficient method in solving the Poisson equation. The pub- 
licly available "Fastest Fourier Transform in the West" 
(FFTw) algorithm (Frigo & Johnson 2005) is used as one 
of the gravity solvers available in ZEUS-MP. Note that 
the parallelized version of the FFTw library using MPI 
is only available in version 2.1.5 or before. Based on this 
version of MPI FFTw, the gravity solver implemented 
in ZEUS-MP is valid only for Cartesian meshes with 
triply-periodic boundaries. 

The transform data used by the FFTw routines is dis- 
tributed, which means that a distinct portion of data 
resides in each processor dming the transformation. In 
particular, the data array is divided along the first di- 
mension of the data cube, which is sometimes called a 
slab decomposition. Users can design their data layout 
using slab decomposition as the FFTw requires but it 
is inconvenient for solving general problems. Therefore, 
there is a preprocessing of the density data distribution 
from a general block domain decomposition to a slab de- 
composition before calling the FFTw routines. After the 
potential is calculated, the slab decomposition of the po- 
tential is transformed back to block decomposition in the 
postprocessing stage. Figure 6 shows the idea of these 
two additional processes. 

In figure 6, the initial density data block is first re- 
arranged from block decomposition into slab decompo- 
sition. In this example using a 2 x 2 x 2 topology, 
the first layer of blocks will be divided into four slabs. 
The four processors exchange the data slabs to ensure 
the density data remains organized at the same spatial 
location. Therefore, the data exchange can be viewed 
as a rearrangement of the spatial location of processors. 
Since data in the second layer of blocks does not overlap 



with the first layer of data blocks, the non-blocking data 
communication among blocks in different layers can pro- 
ceed simultaneously. After the gravitational potential is 
calculated, the reverse rearrangement of potential data 
from slab decomposition back to block decomposition is 
performed in an analogous manner. 

Because of the required slab decomposition in FFTw, 
the number of processors that can be used for a given 
problem size is limited. For example, for a problem with 
512^ zones on 512 processors, each slab least one cell 
thick. Using more than 512 processors in this example 
will not lessen the time to solution for the potential as 
extra processors would simply stand idle. 

4. VERIFICATION TESTS 

In this section we present results from a suite of test 
problems designed to stress each of ZEUS-MP 's physics 
modules and verify the correct function of each. We be- 
gin with a pure HD problem which follows shock propa- 
gation in spherical geometry. We then examine a trio of 
MHD problems, two of which were considered in Stone 
& Norman (1992b), and the final of which has become 
a standard multidimensional test among developers of 
Godunov-based MHD codes. The section concludes with 
two radiation problems, the first treating radiation dif- 
fusion waves through static media; the second following 
the evolution of radiating shock waves. All problems 
with shocks use a quadratic (von Neumann-Richtmyer) 
artificial viscosity coefficient qcon of 2.0. The Orszag- 
Tang vortex test uses an additional linear viscosity with 
a value of qlin = 0.25. 

Three of the test problems discussed below which in- 
clude hydrodynamic effects are also adiabatic: these are 
the Sedov- Taylor HD blast wave, the MHD Riemann 
problem, and the Orszag-Tang MHD vortex problem. In 
these cases, the total fluid energy integrated over the grid 
should remain constant. Because ZEUS-MP evolves an 
internal gas energy, rather than total fluid energy, equa- 
tion, the ZEUS-MP solution scheme is non-conservative 
by construction. It therefore behooves us to monitor and 
disclose errors in total energy conservation as appropri- 
ate. For the three adiabatic dynamics problems noted 
above, the total energy conservation errors (measured 
relative to the initial integrated energy) were 1.4%, 0.8%, 
and 1.6%, respectively. 

For problems involving magnetic field evolution, an ad- 
ditional metric of solution fidelity is the numerical adher- 
ence to the divergence-free constraint. As shown analyt- 
ically in §3.2 and previously in Hawley & Stone (1995); 
Stone & Norman (1992b); Evans & Hawley (1988), the 
Constrained Transport algorithm is divergence-free by 
construction, regardless of the method chosen to com- 
pute the EMF's. Nonetheless, all numerical codes which 
evolve discrete representations of the fluid equations are 
vulnerable to errors bounded from below by machine 
round-off; we therefore compute V-B/|B| at each mesh 
point and record the largest positive and negative val- 
ues thus obtained. For the Alfven rotor and MHD Rie- 
mann problems, which are computed on ID grids, the 
maximum normalized divergence values at the end of 
the calculations remain formally zero to machine toler- 
ance in double precision (15 significant figures). For the 
2D Orszag-Tang vortex, the divergence-free constraint 
remains satisfied to within roughly 1 part in 10^^, con- 
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4 6 
Radius 

Fig. 7. — Density vs. radius for the Scdov- Taylor blast wave. 
Density is plotted in units of 10^*^ g crn^'*; radius is plotted in 
units of 10^'^ cm. Profiles represent nurnerieal results plotted in 
intervals of 6 x 10* seconds. The dashed line shows the analytic 
value for the peals density. 

sistcnt with machine round-off error over an evolution of 
roughly 1000 timcsteps. 

4.1. Hydrodynamics: Sedov-Taylor Blast Wave 

Our first test problem is a classic hydrodynamic test 
due to Sedov (1959), in which a point explosion is in- 
duced at the center of a cold, homogeneous sphere in the 
absence of gravity. Problem parameters are chosen such 
that the explosion energy is orders of magnitude larger 
than the total internal energy of the cloud. In this case, 
the resulting shock wave evolves in a self-similar fashion 
in which the shock radius and velocity evolve with time 
according to 

1/5 



vsh = U (^) (71) 



and 



Vsh 



Eo 

Po 



1/5 



t 



-3/5 



(72) 



where Eo and po are the explosion energy and the ini- 
tial density, respectively, (^sh is a dimensionless constant 
which is equal to 1.15 for an ideal gas with 7 = 5/3. The 
density, pressure, temperature, and fluid velocity at the 
shock front are given by 



Ps = 
Ps = 

= 



PoVsh^ 
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(73) 
(74) 

(75) 
(76) 




Fig. 8. — Shock radius vs. time for the Sedov-Taylor blast 
wave. Time is plotted in units of 10* seconds in the right-hand 
figure. Open circles represent numerical results plotted at the times 
corresponding to profiles in Figure 7. The solid line indicates the 
analytic solution. 

Our problem was run in one dimension on a mesh of 
500 zones equally spaced in radius. We initialize a spher- 
ical cloud of radius 10^** cm with a uniform density of 
10~* g/cm^. The initial temperature is 50 K. At t = 0, 
gj.gg internal energy are deposited within a ra- 
dius of 10^^ cm, which spreads the blast energy over 5 
zones. Depositing the energy over a few zones within a 
small region centered on the origin maintains the point- 
like nature of the explosion and markedly improves the 
accuracy of the solution relative to that obtained if the 
all of the energy is deposited in the first zone. 

Figures 7 and 8 provide results of our Sedov-Taylor 
blast wave test. Figure 7 shows radial plots of density 
separated in time by 6 x 10^ seconds. The density and 
radius are expressed in units of 10^^ g/cm^"' and cm, re- 
spectively. The dashed line indicates the analytic value 
for the density at the shock front. Figure 8 shows nu- 
merical values (open circles) of the shock front at times 
identical to those used in the Figure 7. These data are 
superimposed upon the analytic solution (solid line) for 
the shock radius given by equation 71. 

4.2. MHD 

4.2.1. Magnetic Braking of an Aligned Rotor 

Our first MHD test examines the propagation of tor- 
sional Alfvcn waves generated by an aligned rotor. A 
disk of uniform density pa, thickness za, and angular ve- 
locity Oq lies in an ambient, initially static, medium with 
density Pm- The disk and medium are threaded by an ini- 
tially uniform magnetic field oriented parallel to the rota- 
tion axis of the disk. Considered in cylindrical geometry, 
rotation of the disk produces transverse Alfvcn waves 
which propagate along the Z axis and generate non-zero 
components of velocity and magnetic field. Analytic 
solutions for v,j, and were calculated by Mouschovias 
& Paleologou (1980) under the assumption that only the 
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Fig. 9. — MHD aligned rotor: angular velocity vs. Z. The dashed 
line indicates the analytic solution of Mouschovias & Paleologou 
(1980). 
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Fig. 10. — MHD aligned rotor: vs. Z. The dashed line 
indicates the analytic solution of Mouschovias & Paleologou (1980). 

transverse Alfvcn wave modes are present; to reproduce 
these conditions in ZEUS-MP, compressional wave modes 
due to gradients in gas and magnetic pressures are arti- 
ficially suppressed. The utility of this restriction lies in 
the fact that in more general calculations, errors in the 
propagation of Alfvcn waves may easily be masked by 
the effects of other wave modes in the problem. 

The problem parameters as described above corre- 
spond to the case of discontinuous initial conditions con- 
sidered in Mouschovias & Paleologou (1980). We con- 
sider a half-plane spanning the range < Z < 15, 
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Fig. 11. — MHD Riemann problem: density vs. X at t = 80 sec. 
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MHD Riemann problem: 1-velocity vs. X at t 
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with pd = 10 and pm = 1- Because there are no dy- 
namical phenomena acting along the radial coordinate, 
we may compute the problem on a 1-D grid of Z on 
which R- and ^invariant values of and are com- 
puted. Figures 9 and 10 show results of the calculation 
at a time t = 13. Solid curves indicate the solutions 
from ZEUS-MP; dashed lines show the analytic solution 
of Mouschovias & Paleologou (1980). These results are 
consistent with those obtained with ZEUS-2D and shown 
in Stone & Norman (1992b); the only salient difference 
between the two calculations is that we used twice as 
many zones (600) as reported for the ZEUS-2D calcu- 
lation. The increased resolution is mandated by the 
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Fig. 13. — MHD Riemann problem: 2-velocity vs. X at t = 80 
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Fig. 14. — MHD Riemann problem: 3-velocity vs. X at t = 80 



fact that the HSMOCCT algorithm is by construction 
more diffusive than the original MOCCT algorithm docu- 
mented in Stone & Norman (1992b). As noted previously 
in section 3.2 and discussed in detail in Hawley & Stone 
(1995), this added diffusivity makes the MOCCT algo- 
rithm more robust in fully multidimensional calculations. 
The requirement within HSMOCCT of higher resolution 
with respect to ZEUS-2D's older MOCCT algorithm is 
maximized in this test problem due to the artificial sup- 
pression of compressive liydrodynamic waves and longi- 
tudinal MHD waves; the true resolution requirements of 
HSMOCCT as implemented in ZEUS-MP wih depend in 
part upon the relative importance of various wave modes 
to the calculation and will in general be problem depen- 
dent. 

4.2.2. MHD Riemann Problem 

Our second MHD problem is a magnetic shock tube 
problem due to Brio & Wu (1988). Test results with 
ZEUS-2D using the van Leer advection algorithm were 
also published in Stone & Norman (1992b). This test 
problem is described by "left" and "right" states in which 
the discontinuous medium is threaded by a magnetic field 
which is uniform on both sides but exhibits a kink at the 
material interface. Our formulation of the problem dif- 
fers from that of Brio & Wu (1988) and Stone & Norman 
(1992b) only in that we have oriented the transverse com- 
ponent of the magnetic field to have non-zero components 
along both the Y and Z axes in Cartesian geometry. At 
t = 0, our left state is given by {pi, pi, Bf, Bf, Bf) = 
(1.0, 1.0, 0.75, 0.6, 0.8), and the right state is given by 
{pr, Pr, B?, By, B^) = (0.125, 0.1, 0.75,-0.6,-0.8). All 
velocities are initially zero. The ratio of specific heats for 
this problem is 2.0. As with the calculation in Stone & 
Norman (1992b), the problem is computed in 1-D on an 
800-zone grid and run to a time of t = 80. The spatial 
domain length is 800. 

Figures 11 through 16 show the results obtained 
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Fig. 15. — MHD Riemann problem: 2-component of magnetic 
field vs. X at t = 80 sec. 

with ZEUS-MP. The 1-componcnt of B (not inchidcd 
in the figures) remained flat over the domain at its ini- 
tial value, as expected. The grid resolution is identical 
to that used in the ZEUS-2D calculation, and the results 
are evidently consistent (see Fig. 6 of Stone & Norman 
(1992b)). While this problem is not truly multidimen- 
sional, it does exhibit both transverse and comprcssional 
wave modes, in contrast with the previous test problem. 
In this case, we may qualitatively match the results from 
ZEUS-2D without an increase in grid resolution. 

4.2.3. Orszag-Tang Vortex 

Our final MHD test problem is a multidimensional 
problem due to Orszag & Tang (1979) which has been 
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Fig. 16. — MHD Riemann problem: 3-component of magnetic 
field vs. X at t = 80 sec. 

featured in a number of recent MHD method papers, 
such as Dai & Woodward (1998); Ryu et al. (1998) 
and Londrillo & Del Zanna (2000); Londrillo & del Zanna 
(2004). This problem follows the evolution of a 2-D 
periodic box of gas with 7 = 5/3, in which fluid ve- 
locities and magnetic field components are initialized 
according to v = Vq [— sin (27ry) x + sin {2ttx) y] and 
B = Bo [— sin (27ry) X + sin (47ra;) y] , with vq = 1 and 
Bq = l/(47r)-^/^. The box has length 1.0 along each side. 
The density and pressure are initially uniform with values 
of 25/(367r) and 5/(127r), respectively (these choices lead 
to an initial adiabatic soimd speed of 1.0). Subsequent 
evolution leads to a complex network of waves, shocks, 
rarefactions, and stagnant flows. Ryu et al. (1998) pro- 
vide greyscale snapshots of the flow fleld at t = 0.48; in 
addition, they provide 1-D cuts through the data along 
the line given hy y = 0.4277, over which the gas and mag- 
netic pressures are plotted as functions of x. The Ryu 
et al. (1998) results were computed on a 256^-zone Carte- 
sian mesh. For consistency, we also computed the prob- 
lem on a 256^-zone mesh, from which comparison values 
of pressure at the identical cut in y may be extracted. 
To explore the effect of resolution, we also provide 2-D 
greyscale images from a 512^-zonc calculation. 

Our multidimensional flow structures at t = 0.48 are 
given in Figures 17 through 20, which are to be com- 
pared to the grey-scale panels on the left-hand side of 
Figure 3 in Ryu et al. (1998). Figure 21 presents line 
plots of gas and magnetic pressure along a line of x lo- 
cated a.t y = 0.4277. Save for a very small notch in the 
gas pressure near x = 0.5, our pressure proflles from the 
256^ calculation appear to be virtually identical to those 
from Ryu et al. (1998) at identical resolution. With re- 
spect to the 2-D images, the effect of resolution is most 
apparent in maps of the velocity divergence (Figures 19 
and 20). Again, the ZEUS-MP results at a grid resolu- 
tion of 256^ compare quite favorably to those from Ryu 
et al. (1998), with subtle flow features marginally less 
well resolved. Our results are likewise consistent with 




Fig. 17. Orszag-Tang vortex: pressure at t = 0.48 seconds. 

256'^ zones were used. 




Fig. 18. — Orszag-Tang vortex: pressure at t 
512^ zones were used. 



0.48 seconds. 



those computed at similar resolution by Dai & Wood- 
ward (1998) and Londrillo & Del Zanna (2000) (Londrillo 
& del Zanna (2004) also computed the problem but did 
not include figures matching the other cited works). The 
256^ and 512" results clearly bracket those of Ryu et al. 
(1998); thus we see that in this problem axial resolution 
requirements of the two codes differ by at most a factor 
of 2, which we consider an agreeable result for a finite- 
difference staggered-mesh code. 

4.3. Radiation 
4.3.1. Marshak Waves 
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Fig. 19. — Orszag-Tang vortex: V - v at t ■ 
256^ zone calculation. 
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Fig. 20. — Orszag-Tang vortex: V • v at t = 0.48 seconds for the 
512^ zone calculation. 



We begin our examination of radiation physics with a 
test problem emphasizing the eouphng between matter 
and radiation. The Marshak wave problem we compute 
was formulated by Su & Olson (1996) after a description 
in Pomraning (1979). The problem considers the heating 
of a uniform, semi-infinite slab initially at T = every- 
where. The material is characterized by a T-independent 
(and therefore constant) opacity (k) and a specific heat 
(a) proportional to T^, in which case the gas and radi- 
ation energy equations become linear in the quantities 
E and T^. Pomraning defined dimensionless space and 
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Fig. 21. — Gas and magnetic pressures vs. x along y = 0.4277 
at time t = 0.48 sec. 

time coordinates as 
and 



X = \/3kz, 



(77) 



t, 



(78) 



and introduced dimensionless dependent variables, de- 
fined as 



and 



v{x,t) = {£j 



-Fine 

aT'^{z,t) 



(80) 



In (79) and (80), Fine is the incident boundary flux. 
With the definitions given by (77) through (80), Pomran- 
ing showed that the radiation and gas energy equations 
could be rewritten, respectively, as 



du(x,T) d^uix^r) , , , , 
e — TT-^ — = v[x,t) — u[x,t), 



and 



dr dx'^ 

dv{x, t) 
dr 



u{x, t) — v{x, r). 



subject to the following boundary conditions: 
m(0, r) - — \' ' = 1, 



and 



\/3 dx 

ti(oo,r) = u(x, 0) = v{x,0) 



0. 



(81) 
(82) 

(83) 

(84) 



The user-specified parameter e is related to the radia- 
tion constant and specific heat through 

(85) 



a 



With a choice of e, the problem is completely specified 
and may be solved both numerically and analytically. For 
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Fig. 22. — Curves of log(U) and log(V) vs. log(X) for 2 values of 
T. Curves show analytic solutions; circles indicate numerical data. 

the ZEUS-MP test, we chose a 1-D Cartesian grid with 
200 zones and a uniform density of 1 g cm~^. The do- 
main length is set to 8 cm, and the photon mean-free path 
{k~^) is chosen to be 1.73025 cm. Because this problem 
was designed for a pure diffusion equation, no flux lim- 
iters were used in the FLD module, e was chosen to be 
0.1, allowing direct comparison between our results and 
those given by Su & Olson (1996). Our results arc shown 
in Figure 22, in which the dimcnsionless energy variables 
u and V arc plotted against the dimcnsionless space co- 
ordinate X at two different values of the dimcnsionless 
time, r. The open circles indicate benchmark data taken 
from the tabulated solutions of Su & Olson (1996); solid 
curves indicate ZEUS-MP results. The agreement is ex- 
cellent. 

4.3.2. Radiating Shock Waves 

The classic text on the theory of shock waves and asso- 
ciated radiative phenomena is due to Zel'Dovich & Raizer 
(1967) (see also Zel'Dovich & Raizer (1969) for a short 
review article on shock waves and radiation). A more 
recent summary of basic concepts is available in Mihalas 
& Mihalas (1984). Radiating shock waves differ qualita- 
tively from their purely hydrodynamic counterparts due 
the presence of a radiative precursor created by radiative 
preheating of material upstream from the shock front. 
The existence of this precursor gives rise to the identifi- 
cation of so-called subcritical and supercritical radiating 
shocks, which are distinguished by a comparison of the 
gas temperature behind the shock front to that in the ma- 
terial immediately upstream from the shock. In the case 
of subcritical shocks, the post-shock gas temperature ex- 
ceeds the upstream value, and the radiative precursor 
is relatively weak. As the shock velocity is increased 
beyond a critical value, however, the upstream gas tem- 
perature becomes equal to (but never exceeds) the post- 
shock temperature; such shocks show very strong radia- 
tive preheating of the unshocked gas and are identified 
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Fig. 23. — Subcritical radiating shock; matter and radiation 
temperatures vs. comoving X coordinate. Plot times are 5400, 
l.TxlO-*, 2.8x10*, and 3.8x10* seconds. 
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Fig. 24. — Supercritical radiating shock; matter and radiation 
temperatures vs. comoving X coordinate. Plot times are 860, 4000, 
7500, and 1.3x10* seconds. 



as supercritical shocks. 

A numerical prescription for radiating shock test prob- 
lems appropriate for astrophysical simulation codes was 
published by Ensman (1994); this configuration was re- 
visited by Gehmeyr & Mihalas (1994) and again by Sin- 
cell et al. (1999a,b) and Hayes & Norman (2003). In 
this model, a domain of length or radius 7 x 10 cm 
and an initially uniform density of 7.78 x 10~* g cm~'^ 
is given an initial temperature profile such that T falls 
smoothly from a value of 85 K at the inner boundary 
to 10 K at the outer boundary. The non-zero gradient 
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was necessary to avoid numerical difficulties in Ensman's 
VISPHOT code. A constant opacity of 3.1 x 10"^" is cho- 
sen, which yields a photon mcan-frcc path roughly 5% of 
the domain length. Because the VISPHOT code uses a 
Lagrangean mesh, the shock is created by a "piston" af- 
fected by choosing an inner boundary condition on the 
fluid velocity. ZEUS-MP recreates this condition on an 
Eulerian grid by initializing the fluid velocity throughout 
the domain and outer boundary to the (negative of) the 
required piston velocity. The subcritical shock and su- 
percritical shock tests share all problem parameters save 
for the piston velocity, chosen to be 6 km/s in the former 
case and 20 km/s for the latter. 512 zones were used to 
execute the problem on a 1-D mesh. 

Figures 23 and 24 present temperature profiles for the 
subcritical and supercritical cases, respectively. To aid 
comparison of our Eulerian results to the Lagrangean re- 
sults of Ensman (1994), we transform the coordinate axis 
into the rest frame of the unshocked matter. Solid lines 
indicate gas temperature; dashed lines indicate a radi- 
ation "temperature" defined by T,. = (E/a,.)^^^, where 
Or is the radiation constant. Note the strongly pre- 
heated material ahead of the shock front in the super- 
critical case. Our results were computed in Cartesian 
geometry; those from Ensman (1994) were computed 
in a thin spherical shell with a large radius of curva- 
ture. This problem was also treated by Hayes & Norman 
(2003) using ZEUS-MP coupled to a' parallel VTEF al- 
gorithm. Because that code was designed specifically for 
2-D cylindrically-symmetric problems, the problem ge- 
ometry was diff'erent in that the radiating surface was 
planar yet of finite transverse extent, whereas these re- 
sults consider a formally infinite plane. This difference 
results in somewhat different peak values for tempera- 
ture, but otherwise the results are qualitatively consis- 
tent. 

5. PERFORMANCE TESTS 

Section 4 considered test problems which gauge the 
accuracy of the code. This section considers issues of 
numerical performance, with a particular emphasis on 
problems distributed among large numbers of parallel 
processors. 

5.1. Aspects of Scalability 

The topic of parallel performance is most often encap- 
sulated in the notion of scalability^ which in this con- 
text is typically assessed by measuring the reduction in 
CPU time for a given quantity of numerical work as this 
work is distributed among an increasing number of pro- 
cessors. Relative to the cost on one CPU, perfect scal- 
ability would be represented by a cost reduction factor 
of l/N when the same job is distributed across N pro- 
cessors. For tasks in which each processor can operate 
upon its portion of data independently of all other pro- 
cessors (a so-called embarrassingly parallel operation), 
perfect scalability is trivially achieved. Algorithms which 
compute solutions to spatially-discrctized PDE's arc by 
construction not embarrassingly parallel because the dis- 
crete spatial derivative operators employ stencils that 
overlap processor tile boundaries along the tile edges. 
On distributed- memory computers, data communication 
will therefore be required. Efficient management of this 



TABLE 2 

Radiation Diffusion Test Parameters. 



Medium 


Outer Radius (cm) 


p (g/cm^) 


Temperature (eV) 


D-T gas 


0.087 


0.025 


0.025 


D-T ice 


0.095 


0.25 


0.025 


C-H foam 


0.111 


1.20 


0.025 


He gas 


0.490 


0.01 


300 



communication is thus a key ingredient to an efficient 
parallelization strategy. 

More generally, scalability describes the sensitivity of 
an algorithm's CPU cost to a number of factors, of which 
parallelism is a leading but by no means unique member. 
Section 3.8.2 compared the cost of MG linear solvers to 
traditional stationary methods for a given problem size; 
the costs of the two methods exhibit very different de- 
pendencies upon the number of unknowns in the linear 
system. For iterative methods such as CG and MG, the 
required number of iterations for a converged solution is 
the primary factor in algorithm cost. Solvers whose it- 
eration counts vary more weakly with problem size are 
to be favored for very large problems. In ideal cases, the 
required number of iterations for convergence of an MG 
solver can be virtually independent of the problem size, 
thus MG is often said to scale well to large problems. 
Because this behavior is orthogonal to the issue of par- 
allel decomposition, we identify this as an independent 
definition of scalability. 

An additional factor bearing on the cost of an iterative 
linear system solution is diagonal dominance, a condi- 
tion in which matrix elements along the main diagonal 
are much larger in magnitude than off-diagonal elements 
along a given row. Matrices resulting from the discretiza- 
tion of the time-dependent diffusion equation exhibit di- 
agonal dominance that varies directly with the size of 
the time step. To see this, consider the static diffusion 
equation discretized on a 1-D mesh with uniform spacing, 
Ax. For a static medium, the radiation energy equation 
becomes 

rJF 

— - V • £>VE = S, (86) 

where S contains local source terms. We assume a spa- 
tially uniform medium (yielding a spatially constant D) 
and write this equation in discrete form as 



E"+i - E" 



D 



(Aa.) 



(Er+/-2E^^+E!!_+i) = Si, 

(87) 

which may be rearranged to make the linear system 
structure obvious: 



" DM ' 


T?n+1 


_(Ax)2 


" DM ' 


pn-l-1 





2DM 



(Aa;)2 
E" + SiM. 



e: 



(88) 



The relationship between time step and diagonal domi- 
nance is manifest: in the limit that At 0, the linear 
system represented by (88) reduces to the identity ma- 
trix! In the opposite limit, the off-diagonal elements are 
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Fig. 25. — Radiation diffusion (ID); profiles of radiation tem- 
perature (solid lines) and gas temperature (dashed lines) at times 
of 0.0, 0.1, 0.5, 0.7, 1.0, and 10 ns. The initial profiles axe step 
functions with the discontinuities located at R = 0.11 cm. The 
radiation wave propagates leftward, with the gas temperature lag- 
ging the radiation temperature, particularly in the optically thick 
regions. 
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Fig. 26. — Radiation diffusion (ID): logarithms of the time step 
(solid line) and mean number of CG iterations per N-R iteration 
(dashed line) as functions of the evolution time. Times are mea- 
sured in seconds. 

comparable in magnitude to the main diagonal, a situa- 
tion which results in greatly increased numbers of itera- 
tions required for convergence in a CG linear solver such 
as that implemented in our FLD module. 

We demonstrate this behavior with a radiation diffu- 
sion problem in which a cold sphere is immersed in a 



high-temperature radiation field. Problem parameters 
are given in Table 2. The initial values of density and 
temperature are taken from a test problem given in Hayes 
& Norman (1999) designed to qualitatively mimic fea- 
tures of an Inertial Confinement Fusion (ICF) simulation 
such as that used by Baldwin et al. (1999) to compare 
numerical performance of CG-bascd and MG-based lin- 
ear system solvers. The irradiated sphere is constructed 
of layers with strongly disparate densities and photon 
mean-free paths. The actual physical system this prob- 
lem imitates (albeit crudely) is a sphere of D-T gas sur- 
rounded by a solid D-T "ice" of higher density, itself sur- 
rounded by a carbon-hydrogen foam of yet higher den- 
sity. This assembly is immersed in a low density He gas 
subjected at t = to an intense radiation field with a 
characteristic temperature of 300 eV. Opacities for real 
ICF materials have complex dependencies on energy and 
composition; our toy problem captures the gross features 
of the mean-free path variation via the following expres- 
sion: 

with Aq, Poi Tqi and /i given by 10^*^', 1.2 g/cm^, 
0.025 eV, 2.0, and 1.2, respectively. A further restric- 
tion is placed on the resulting opacities such that the 
minimum and maximum allowed values of x ^-rc 10 and 
10® cm^^. This restriction filters out unphysically high 
and low values of the absorption coefficients. The im- 
portant feature of this problem is that low-density gas 
is surrounded by two solids of much higher (and differ- 
ing) densities. This construction rc^sults in an inward- 
propagating radiation diffusion wave with a highly vari- 
able rate of progress. Snapshots of the gas and radia- 
tion temperatures at times of 0, 0.1, 0.5, 0.7, 1.0, and 
10 nanoseconds are given in Figure 25. Histories of the 
time step and average number of CG iterations required 
for NR iteration in the FLD module are plotted against 
evolution time in Figure 26. The choppy appearance of 
the plots is an artifact of sampling (every tenth cycle was 
archived for plotting). For static diffusion problems or 
RHD problems characterized by rapidly time- varying ra- 
diation fields, evolution of the time step will be strongly 
constrained by the maximum allowed fractional change 
in the radiation energy (= 0.01 in this problem). The 
initial time step progression is upward as the exterior 
radiation field slowly diffuses through the opaque foam 
layer. By 0.5 ns, the radiation has diffused through the 
foam layer and begun to penetrate the less opaque D-T 
ice layer, at which time the time step drops sharply ow- 
ing to the more rapid evolution of the radiation energy. 
The time step trends upward again until the radiation 
wave breaks through the D-T gas/ice boundary. The 
time step then drops again as the radiation streams into 
the central region. The final evolution of the time step 
is upward as the problem domain reaches its final equi- 
librium. The equilibrium temperature is lower than the 
initial exterior value because a reflecting outer boundary 
was chosen rather than an imposed boundary flux. 

The dashed line in Figure 26 shows that the num- 
ber of CG linear system iterations required to solve the 
FLD matrix during an outer N-R iteration closely par- 
rots the time step behavior. This is understood as a 
natural consequence of time-dependent diagonal domi- 
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Fig. 27. Strong-scaling comparison: relative sijecdup vs. num- 
ber of CPUs for problems of fixed size. The solid curves indicates 
perfect N-fold scaling with CPU count. 256'^ zones were used for 
each problem; the maximum CPU number was 1024 for cacli case. 

nance of the matrix as illustrated by equation 88. This 
exercise demonstrates the existence of a third dimension 
of scalability of partictilar relevance to timc-dcpcndcnt 
simulations: the dependence of CPU cost upon time step 
size. When a linear system is very nearly diagonal, both 
CG and MG will converge rapidly, but because a sin- 
gle iteration of full MG is more expensive than a single 
CG iteration, one may expect situations in which CG 
presents a more economical solution strategy for a given 
problem size. That increasing time step size could pro- 
vide a "cross over" point with regard to optimal method 
is a logical consequence. This issue was investigated ex- 
tensively by Baldwin et al. (1999) in the context of 2-D 
RHD simulations in spherical geometry. They considered 
RHD problems in which the time step varied naturally by 
orders of magnitude during the course of the simulation, 
and noted indeed that no one method provided the best 
economy over the entire calculation. While adaptive se- 
lection of linear solvers in a particular physics module has 
not been implemented in ZEUS-MP, we note that exper- 
imention along such lines in the context of astrophysical 
problems is an enticing candidate for future research. 

5.2. Parallel Performance Results 

We explore additional aspects of algorithm perfor- 
mance with a quartet of test problems computed on 3-D 
grids with 256'' zones. The first two problems used are 
non-magnetic and magnetic variants of a simple blast 
wave test in which a sphere with initial overdensity and 
overpressure ratios of 100 and 10^ is defined with re- 
spect to a uniform background medium. The problem 
is defined on a Cartesian grid. The magnetic version 
augments the problem setup with a uniform magnetic 
field aligned with the Z axis. The third problem is a 
3-D calculation of radiation diffusion into an ICF cap- 
sule, with problem parameters as given previously. The 
fourth problem is the gravitational collapse of a pres- 
sureless cloud, using problem parameters given in Stone 



TABLE 3 

Fixed- Work Scaling: 256^ MHD (30 Time Steps) 



Processors CPU Time (sec) Speedup Parallel Efficiency (%) 



1 


9391 






2 


4624 


2.031 


102 


4 


2236 


4.200 


105 


8 


1097 


8.561 


107 


16 


504.9 


18.60 


116 


32 


248.1 


37.85 


118 


64 


128.4 


73.14 


114 


128 


64.60 


145.4 


114 


256 


40.45 


232.2 


90 


512 


21.84 


430.0 


84 


1024 


8.91 


1053 


103 



TABLE 4 

Fixed- Work Scaling: 256^ HD (50 Time Steps) 



Processors CPU Time (sec) Speedup Pcirallel Efficiency (%) 



1 


5256 






2 


2746 


1.91 


96 


4 


1323 


3.97 


99 


8 


669.8 


7.85 


98 


16 


335.4 


15.7 


98 


32 


165.4 


31.8 


99 


64 


89.25 


58.9 


92 


128 


43.14 


122 


95 


256 


20.85 


252 


98 


512 


11.69 


450 


88 


1024 


6.450 


815 


80 



& Norman (1992a). 

For each problem, parallel performance is measured by 
a so-called strong- scaling test in which the total number 
of zones (and therefore the total amount of computa- 
tional work) is held constant as the problem is repeated 
with increasing numbers of CPU's. Each problem is run 
for a small number of cycles (typically 30 to 50) which 
is held fixed for each trial. Figure 27 and Tables 3, 4, 5, 
and 6 summarize the results for the MHD blast wave, HD 
blast wave, radiation diffusion, and gravitational collapse 
tests. The number of timesteps for which each test was 
run is indicated in the title of each table, from which sin- 
gle time step costs may be derived. In this example, the 
gravitational collapse problem solved Poisson's equation 
with the CG linear solver, which is also used in the dif- 
fusion test. It is important to note that in this type of 
scaling study where the total problem size is held fixed, 
parallel scalability will inevitably break down for a suf- 
ficiently large number of processors, due in large part to 
surface-to-volume effects; when the local processor data 
block is too small, the communication cost of shipping 
data along the block's surfaces will compete with the 
computational cost of processing the full block volume. 
The number processors necessary to induce a turnover in 
a code's parallel scalability behavior will depend strongly 
on the level of communication required by the algorithm, 
a point we demonstrate in the experiments that follow. A 
competing technique for measuring scalability, known as 
a weak-scaling test, holds the processor block size con- 
stant and thus scales the total problem size with the 
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TABLE 5 

Fixed- Work Scaling: 256^ FLD (20 Time Steps) 



Processors CPU Time (sec) Speedup Parallel Efficiency (%) 



2 


3184 






4 


1569 


2.03 


102 


8 


887.1 


3.59 


90 


16 


509.8 


6.25 


78 


32 


269.9 


11.8 


74 


64 


146.9 


21.7 


68 


128 


71.12 


44.8 


70 


256 


36.38 


87.5 


68 


512 


20.15 


158 


62 


1024 


12.38 


257 


50 



TABLE 6 

Fixed-Work Scaling: 256^ Grav-CG (5 Time Steps) 



Processors CPU Time (sec) Speedup Parallel Efficiency (%) 



1 


11430 






2 


7013 


1.63 


82 


4 


3696 


3.09 


77 


8 


2300 


4.97 


62 


16 


1666 


6.86 


83 


32 


866.5 


13.2 


41 


64 


422.8 


27.0 


42 


128 


184.8 


61.9 


48 


256 


76.97 


149 


58 


512 


67.47 


169 


33 


1024 


47.64 


240 


23 



number of processors. This alternative has some util- 
ity: if, for example, one determines that twice the grid 
resolution is required to satisfy a given accuracy metric, 
one may investigate if doubling the number of processors 
along the axis preserves the cost of computing a time step 
without degrading parallel performance. While this is a 
relevant consideration, we eschew weak-scaling studies 
in this paper because (1) with a sufficiently large block 
of data on each processor, even poor message-passing 
implementations of parallelism can perform reasonably 
well, and (2) the characteristics of the problem under 
study change as the zone number increases. For Courant- 
limited calculations, doubling the resolution will double 
the number of time steps needed to complete a calcu- 
lation, which rather offsets the virtues of maintaining a 
constant cost per time step. For problems using implicit 
linear solvers, increasing the total number of zones will, 
to a degree depending on the solver method, increase the 
number of iterations required for convergence at each cy- 
cle. Strong-scaling studies, while providing a harsher test 
of a parallel implementation, speak directly to the ques- 
tion: how rapidly may a research problem of a given size 
be solved? 

The behaviors in Figure 27 reflect the relative impact 
of MPI communication operations on each module. The 
superlative scaling of the MHD tests derives from the 
highly computation-intensive nature of the algorithm. 
The HD test is actually a subset of the MHD problem, 
as both the MHD-specific routines and the HD advec- 
tion algorithms must be used in any MHD problem. The 
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Fig. 28. — A comparison of timings for the gravitational col- 
lapse problem in 3-D using the general CG solver and MGMPFs 
multigrid solver for the Poisson equation. 



radiation and gravity problems are both dominated by 
the cost of CG linear solver. The diffusion problem was 
run for a sufficiently limited number of time steps such 
that an average of eight CG iterations were required at 
each time step. In contrast, when used for the Pois- 
son equation, of order 10^ iterations are required for a 
mesh size of 256'^. Because each CG iteration requires 
both MPI data exchanges at tile boundaries and global 
searches for error minima, high iteration counts result in 
very communication-intensive operations. Parallel effi- 
ciency, which is computed by dividing the speedup rela- 
tive to 1 processor by the processor number, is displayed 
in the fourth column of Tables 3-6. Superlinear speedup 
is observed most dramatically for the MHD test; this be- 
havior is a by-product of strong-scaling studies and arises 
because single-CPU performance is degraded when the 
local data chunk is too large to fit in a processor's cache 
memory. This effect decreases as the per-CPU data size 
shrinks; the deleterious effects of communication then 
begin to appear as the processor counts run into the 
hundreds. Some of the peculiar variations in parallel 
efficiency in the MHD example are likely consequences 
of system and network effects associated with the par- 
ticular machine used. Memory, bandwidth, and latency 
characteristic vary tremendously among different archi- 
tectures; but the major trends shown in Figure 27 and 
the associated tables arc representative and instructive, 
and are internally consistent with the relative reliance of 
each module upon data exchange among processors. 

The fact that the CG solver requires - irrespective of 
parallelism - more iterations for larger problems brings 
with it two liabilities: increased iteration counts boost 
both the total operation cost and the number of MPI 
messages sent and received. The fact that multi-grid 
methods exhibit convergence behavior with little sensi- 
tivity to problem size motivated us to implement the 
independently-developed MGMPI package for use as an 
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alternative Poisson solver in ZEUS-MP. In its current 

form, MGMPI is restricted to 3-D Cartesian grids (which 
are nevertheless a common choice in large astrophysical 
calculations) with Dirichlet or Neumann boundary con- 
ditions (recall that the FFTw solver is offered for triply- 
periodic Cartesian meshes). Figure 28 shows the behav- 
ior of solution time against problem size for the gravi- 
tational collapse problem computed on a 3-D Cartesian 
mesh. Grid sizes of 63^, 127^, 255^, and 511^ zones were 
run. (Odd numbers of zones are required by the multigrid 
V-cycle in MGMPI.) Each trial was distributed across 
64 processors to ensure that the larger problems would 
not exceed single-CPU memory limits. At small grid 
sizes, the CG solver is less expensive than the MGMPI 
solution, but at a mesh size of 511'^ the CG solution 
has clearly diverged with respect to the MGMPI solver. 
The fundamental difference lies in the average number 
of solver iterations required per time step. For the CG 
solver, this number was 32, 56, 99, and 190, respectively, 
for the four problem sizes tested. For MGMPI, this num- 
ber is 2.2 for the smallest problem and grows only to 3 
for the 511"^ run. Despite MGMPI's fairly high oper- 
ation cost per iteration, the insensitivity of its conver- 
gence behavior to problem size guarantees, for a given 
parallel distribution, a performance advantage over the 
CG solver for a sufficiently large problem. In its cur- 
rent form, MGMPI does not employ asynchronous MPI 
calls for its message passing, as does the CG solver. The 
problem size for which MGMPI enjoys a clear advan- 
tage over the CG solver may therefore depend in part 
on the number of processors chosen. Nonetheless, for 
very large problems involving self-gravity on a Cartesian 
mesh, MGMPI is likely to be the preferred option in a 
ZEUS-MP calculation. 

6. SUMMARY 

In the introduction, we advertised the theme of this pa- 
per as "physics, flexibility, and parallelism." That these 
features are defining traits of ZEUS-MP is manifest: hy- 
drodynamics, MHD, and radiation diffusion may be de- 
ployed, singly or in concert, on Cartesian, cylindrical, or 
spherical meshes in one to three dimensions. ZEUS-MP 
demonstrates parallel scalability suitable for computing 
platforms ranging from small clusters to the largest plat- 
forms currently available for unclassified research. Fea- 
tures of a code designed for community use must also 
include accuracy and computational expediency. The 
accuracy of ZEUS-MP has been verified both by tradi- 
tional test problems and a multidimensional MHD prob- 
lem frequently touted by developers of Godunov-based 
MHD codes. Even when additional resolution is required 
to ensure accuracy of a calculation, ZEUS-MP 's parallel 
performance provides a powerful mechanism for keeping 
the required solution times manageable. 

Virtues notwithstanding, we note that there are sev- 
eral ways in which ZEUS-MP may be modified and im- 
proved within its solution paradigm. Non-ideal MHD ef- 
fects such as Ohmic dissipation and ambipolar diffusion 
are requisite in a variety of topics in interstellar physics 
such as star formation and interstellar shocks; method- 
ologies for including these effects in the ZEUS framework 
have been documented by Stone (1999) in the case of 
Ohmic dissipation and likewise by Stone (1997) for am- 
bipolar diffusion. A 3-D version of the VTEF algorithm 



described by Hayes & Norman (2003) would be a major 
undertaking, but we note an approximation suggested 
by Ruszkowski & Begelman (2003) as an improvement 
to FLD suitable for the ZEUS codes. Because ZEUS- 
MP is intended for public distribution, one mission of 
this paper is to provide reference documentation at a 
sufficiently high level of detail so that ambitious code 
developers may modify it for their particular needs. 

An additional area of improvement in which we are cur- 
rently engaged concerns the iterative solvers offered with 
the code. The much higher computational cost of simu- 
lations with FLD and the CG-based self-gravity module 
derives from the very high numbers of iterations required 
for the CG linear solver to converge w1k!u the matrix loses 
its diagonal dominance. Because the matrix generated by 
the discrete Poisson equation is never strongly diagonally 
dominant, CG methods lose favor as the tool of choice 
for the Poisson problem on large grids. The suitability 
of CG to radiation problems is very dependent on the 
physical and temporal character of the problem at hand. 
As shown by Baldwin et al. (1999), suitably optimized 
MG methods may be preferable to CG for some classes 
of radiation problems. Our current MGMPI solver is not 
yet fiexible enough for use in radiation applications, but 
enlarging its scope of applicability is a high priority item 
for future research. 

We also note that the convergence requirements of our 
current CG solver may be dramatically improved with 
the use of a more effective preconditioner (recall the dis- 
cussion in §3.8.1). Our solver uses diagonal precondi- 
tioning, which simultaneously boasts maximal ease of 
implementation and minimal range of effectiveness with 
respect to the condition numbers for which convergence 
is notably improved. Despite the importance of precon- 
ditioning to the performance of linear solvers upon which 
many astrophysical simulations must depend, this topic 
has received relatively little attention in the numerical 
astrophysics literature. One study which has focused 
on astrophysical applications was performed by Swesty 
et al. (2004), who considered a class of preconditioners 
known as "sparse approximate inverse" (SPAI) precon- 
ditioners. As the name implies, SPAI preconditioners 
attempt to construct an approximation to the inverse 
of a matrix which is more sophisticated than the purely 
diagonal approximation, but far less expensive to com- 
pute than the full inverse. Swesty et al. (2004) have 
constructed SPAI preconditioners designed for the lin- 
earized, discrete, energy-dependent version of the FLD 
equation, the so-called multigroup flux-limited diffusion 
(MGFLD) equation. The scientific focus in the Swesty 
et al. (2004) paper is on 2-D and 3-D MGFLD linear 
systems written to compute multidimensional neutrino 
diffusion coupled to hydrodynamic flows in core-collapse 
supernova simulations. While their analysis is designed 
for the MGFLD equations, they consider special cases of 
isoenergetic diffusion directly analogous to the reduced 
system of energy-averaged (or "grey") FLD equations 
adopted in ZEUS-MP (and ZEUS-2D). The results re- 
ported in Swesty et al. (2004) suggest that SPAI pre- 
conditioners may offer a very profitable line of research 
for future FLD implementations in ZEUS-MP or other 
application codes. 
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The release of a vastly redesigned and augmented "Ver- 
sion 2" of ZEUS-MP was occassioncd by ZEUS-MP's 
adoption by the Tcrascalc Supernova Initiative as a com- 
putational platform for simulating core-collapse super- 
nova explosions in multidimensions. The TSI project, led 
by Dr. Anthony Mezzacappa, provided both the demand 
and much of the financial support for the effort which cre- 
ated this code. We are indebted to Tony Mezzacappa for 
his continued support, enthusiasm, and remarkable pa- 



tience during the development phase of ZEUS-MP 2.0. 
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the Lawrence Livcrmorc National Laboratory. This work 
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Science High-Energy, Nuclear, and Advanced Scientific 
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APPENDIX 
A. A MAP OF ZEUS-MP 



TABLE Al 

The Mapping Between ZEUS-MP Subroutines and Equations Solved. 
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Table Al provides a reference listing of the major equations derived in the following appendices and the ZEUS- 
MP subroutines which compute them. The first three columns of the table indicate the solution substep, pertinent 
equation, and associated subroutine, respectively. The latter three columns are headed by labels defining three classes 
of simulation: purely hydrodynamic, MHD, and RHD. In each column a 'V mark indicates that the equation on 
that fine is include in the solution update. Minor headings reading "SRCSTEP" and "TRANSPRT" (which reference 
subroutines with those names), respectively, indicate the two major groups of solution substeps introduced in section 3 
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Entries in the table are ordered corresponding to the sequence in which these operations occur during exection, save 
that advcction operations along each coordinate axis in the "TRANSPRT" section are cyclically permuted from one 
time step to the next. 

B. THE 3-D DISCRETE GAS HYDRODYNAMIC EQUATIONS 

B.l. Metric Factors 

ZEUS-MP expresses the discrete fluid equations in the coordinate-independent fashion documented in Stone & 
Norman (1992a). For convenience, we reproduce the basic metric definitions here. The metric tensor, gij, relates the 
length, ds, of a line element in one coordinate space, y'^, to the equivalent expression in a second coordinate space, a;', 
where we assume that the y'^ can be expressed as functions of the a;*. Thus: 



= (S?) (Sj)'^^''^^' ^ 9ijdx^dx^, (Bl) 

where k is summed from 1 to n, where n is the number of dimensions of x. For orthogonal coordinate bases, Qij is 
diagonal; following the convention in Stone & Norman (1992a) we write: 

/ hi \ 

9i,j = { hi \ . (B2) 

V hlj 

In Cartesian coordinates, we then have 

{xi,X2,X3) = {x,y,z); (/ii, /12, /13) = (1,1,1), (B3) 
while in cylindrical coordinates, we have 

{xi,X2,xs) = {z,r,4>); (/ii, /12, /13) = (1,1,^-), (B4) 

and in spherical coordinates, we have 

(a;i,a;2,X3) = {r,9.(py, (fti, /12, /13) = (1, r, r sin6'). (B5) 

Following the convention introduced in the ZEUS-2D papers, the h factors are re-expressed as separable functions of 
g factors which are not to be confused with gi^j defined above: 

hi = I = gi, (B6) 
h2 = f{xi) = g2, (B7) 

h3 = f{xi)f{xs) = 331532. (B8) 

The explicit expressions for 52, 531, and ^'32 are apparent by comparing expressions (B6) - (B8) with (B3) - (B5). 



B.2. Coordinate Meshes 

The staggered-mesh formalism relies upon an "A" mesh, whose points are centered on zone faces, and a "B" mesh, 
whose points arc located at zone centers. The coordinates of the A mesh along each axis are given by xloj, x2aj, and 
x3ak, with corresponding arrays for the B mesh. Associated values for the metric coefficients g2, g31, g32, and the 
derivatives of these coefficients with respect to xl and x2 are likewise evaluated on both meshes and stored in 1-D 
arrays. 

In many (but not all) instances, spatial derivatives are written as functions of volume differences rather than 
coordinate differences. Along the three axes, transformation from coordinate to volume derivatives are written as 

d/dxi .92331 d/dVi, 

d/dx2 ^ 532 d/dV2, (B9) 
d/dx3 d/dVs. 

Scalar field variables (p, e, E, and X(l)) are centered on the B mesh. Velocity and magnetic field vector 

arrays {vlij^k,v2ij^k,v3ij^k)] {blij,k,b2ij^k,b3ij^k) are centered on the appropriate zone faces. Magnetic EMF's 
{£lij^k,£2i,j,kj£3ij^k) are defined at midpoints of zone edges. 



B.3. The "Source Step" Equations 
B.3.1. Body Forces 

In this subsection we document the updates to velocity due to body forces and artificial viscosity, and the updates 
to internal energy due to artificial viscosity and compressional heating. The three components of fluid velocity are 
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updated from body forces due to pressure gradients, self-gravity, rotational pseudo-forces, magnetic pressure, and 
radiation stress according to the following expressions: 



At 



Pi,j,k + Pi-l,j,k 



Y*i,j:k P«— l,_7,fc 



-I- 



i,0,k 



Axlhi 



+ [52°° + <S2f + <S2°°i + <S2f °,] X 



dxl 

1 
8 

9(73 la 
dx\ 

1 



5f2a? {pi,j,k + Pi-i,3,k) 
[53°° + S^Y + 53°°i + S^Zi] X 



v^,,,k + vS^,,,k+v^-i,,,k+vS^i 



3,k+l 



g31a'^g32bj {pij^k + Pi-i,j,k) 
_ (rfl62°° + dl62f + dl63°° + dlbS"") 
2 (Pi,i,fc + Pi-ij.fe) Axl&j 



(Pi,j,fe + Pi-i,i,k) Axlhi 



At 



-I- 



^i,j,k - ^i,j-l,k 



dg32aj 



0x2 
1 



Pi,j,k + Pi,j-i,k ) g2biAx2bj g2biAx2bj 
[53?° + 53°^ + 53™° 53™P] x 



g31big'S2a'^ {pij,k + Pi,j-i,k) 
(d263°° + d263f + (i261°° -h d261f °) 



2 {pi,j,k + Pi,j-i,k) g2biAx2bj 



■2K 



(2) 



(Pi,i,/c + Pi,j-i,k) g2biAx2bj 



At 



Pi,j,fc Pz,j,/i; — 1 



Pi,3,k + Pt,3,k-i J g31big32bjAxSbk g31big32bjAxSbk 
(d3bl?° + rf3blf + (^362?° -h rf3b2f ) 
2 {pi,j,k + Pi,j,k-i) g31big32bjAx3bk 

, 2^(3) ^iJ.fc ~ 



(BIO) 



(Bll) 



^ + Pi,j,k-i) g^ibig32bjAx3bk ' 

Equations (BIO) through (B12) make use of the following functions in the rotational pseudo-force terms: 

52: 



v2ij,k92bi {pi,j,k + Pi,j-i,k) , 
<52r= <,+i,fe526, {pi,j+,,k + Pi,j,k) : 



'53°°= ^- j v3lj ,.g31hg32bj {p^J,k + Pi,j,k-i) , 
'53-^= v3lj^k+i9^'^big32bj {pi,j,k+i + Pi,j,k) , 



53^ 



(B12) 

(B13) 
(B14) 
(B15) 
(B16) 
(B17) 
(B18) 



Similarly, the magnetic pressure terms employ the following: 

{g2bib2i,kf - (fi(26j_i62i_i fe)' 
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dlb2P° : 



^163° 



d2b3°° = 



{g2bib2ij+i^kf - {g'2'bt-ib2i_ij+i^kf 



{gilbib'6 



g2ai 

{gSlbi-ib^i-ij^kY 



g31di 

{g31bib3ij^k+if - (.g31&j_i63i_i 



gSlaf 

{g'62bjb3ij^kf - {g32bj-ib3ij-i^kf 



952a] 

{g32bjb3ij^k+if ~ {g32bj^ib3ij-i^k+if 



(B20) 
(B21) 
(B22) 
(B23) 



d2b3"P= ^ _ „ , 


(B24) 


d2bl"'' = {bU,j,kf - {bh,j-i,kf, 


(B25) 


d2blP" = {bli+ij,kf - {bli+i,j-i,kf, 


(B26) 


d3bl'"' = {blij,kf - {blij,k-if, 


(B27) 


d35F° = (61i+i,,-fc)2 - 


(B28) 


d362°°=(62,j-fc)2-(&2,,,-fc_i)2, 


(B29) 


d362°f = (62i,,+i,fc)2 - (62,,,+i,fc_i)2. 


(B30) 



B.3.2. Artificial Viscosity 

Once the velocity update from forces is complete, velocities which were known at time level "n" are now known 
at an intermediate time level which we designate as level "n + a" . These intermediate velocity components are then 
updated due to the Von Neumann and Richtmyer prescription as follows: define 



Avh, 



k 



10, 



Av3 



v3"+i 



and 



Qh,j,k = CavP{A.vlij^k) ; 
The velocity updates are then computed as 



0, 
0, 



''-^i-l,j,/c' '^^i,j,k ^ '^-^i-l.j.fe' 
'^^i,3,k — "^i-l,j,k' 

v27\:j,k,v2i+;k<v27,pi,k; 

v2lll>v2lp,^,; 
?)3"+" 7i3"+'* ^ 7i3"+'* • 
i,],k — i,],k—l' 



CavP{Av2ij^kf ; q3ij^k = Cavp{Av3ij^kf . 



v2 



n+b . 



t^3"+t : 



:U2"+" 



=u3"+" 

i,3,k 



Q^i.jM — <l^i-l,j,k 

Axlbt (pij^k + Pi^i,],k) /2' 

Q_'^i^j,k Q'^i,j — l,k 

g2biAx2bj {pij^k + Pi,j-i,k) /2' 



g31big32bjAx3bk {pi,j,k + Pi,j,k-i) /2' 
The gas internal energy is simultaneously updated via 



"i,3,k 



= e. 



'i,3,k 



Qh,j,k^vh,j,k _ q'^i,j,k^v\j,k _ Q\j,k^v\j,k 
dxltti 



(B31) 
(B32) 

(B33) 

(B34) 

(B35) 
(B36) 
(B37) 

(B38) 



g2bidx2aj g31big32bjdx3ak 

For problems with strong shocks, an additional linear viscosity may used. ZEUS-MP includes a linear viscosity of 
the form described in Stone & Norman (1992a), in which the linear viscosity depends upon the local sound speed: 

1/2 



3,k 



P Au, 



(B39) 



where C; is a constant (typically of order 0.1) and Av is the difference in neighboring velocities along the coordinate 
under consideration. As with the quadratic viscosity, qlinj^^fc is evaluated independently along each axis. The updates 
to velocity and gas energy are identical to those for the quadratic viscosity save for the replacement of "g" with "qlin" 
in equations (B35) through (B38). 
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B.3.3. Compressional Heating 

For an ideal EOS the pdV compressional heating term is evaluated exactly as outline in Stone & Norman (1992a): 
to improve energy conservation, the updated gas energy can be written as an implicit function of a time-centered 
pressure, whence 

(B40) 



(e"+i-e")/Ai = -p"+i/2y.^^ 
where p"+^/2 = 0.5 (p" + p""*"^) . Using the equation of state, p = (7 — 1) e, (B40) may be rewritten to yield 



e"+^ = 



l-(Ai/2) (7-l)V-Vi,,-,fc 



1 + (AV2) (7-l)V-Vi,,-,fc 



(B41) 



where 



and c"+^ arc the gas energies immediately prior to and after the pdV update. For non-ideal equations of 



state, predictor-corrector techniques or Newton-Raphson iterations over temperature may be employed. 

B.4. The "Transport Step" Equations 

In the transport step, ZEUS field variables are advected through the computational mesh using the technique of 
consistent transport, introduced by Norman et al. (1980). Consistent transport attempts to minimize local conservation 
errors due to numerical diffusion by defining face-centered fiuxes of each field variable consistent with the mass flux 
used to advect the matter density. In this procedure, the quantities advected are the mass density (p), the specific 
internal energy (e/p), the specific radiation energy (E/p), and the specific momenta SI — p vl, S2 = p g2 v2, and 
S3 = p g31 g32 v3. The metric factors introduced into the definitions of 52 and S3 transform these quantities into 
angular momenta in curvilinear coordinates. 



B.4.1. Scalar Variables 

We first consider the advection of mass density along the i coordinate. The amount of mass crossing a cell face 
perpendicular to the i axis in a time step, At, is given by 



M/j_feAt = pi,j,kM {vU,j,k - vgli) At, 



(B42) 



whore Ai is the time-centered area factor for cell i-facc i, and p is the matter density average to cell face i. ZEUS- 
MP uses second-order Van Leer (van Leer 1977) averaging to construct monotonic, upwinded averages of all advected 
quantities. For advection across the i faces, the time-centered area factor (which accounts for grid motion) is 



A, = g2ar'^'g31ar'/\ 
The computed mass flux, Ml - j^, is then used to advect Pi,j,k according to 

Pli^^dvlla^ + (m^,. - M^i,,-,fc) At] /dvlla'l+K 



(B43) 



(B44) 



Consistent transport of the gas and radiation energy densities proceeds by defining specific energies (erg/gm) for 
each of these quantities, averaging the specific energy to cell faces via Van Leer interpolation, and computing fiuxes 
across each face with the mass fluxes computed in (B42). We thus have: 



- 1 M^^. , - ( - 



A<| /dvlla'l+^; 



E' 



n+l 



E , . 1 

- M} 



E 



7 ^iVi,. 



At ) ldvl\a\ 



n-\-\ 



(B45) 



(B46) 



The multi-species composition advection uses the X(l) variables to deflne partial densities, which are then advected 
and converted back to dimensionless mass fractions. Thus: 



At 



■''+ldvllar' 



(B47) 



For the advection of scalar variables across cell faces perpendicular to the j axis, we write mass fluxes and time- 
centered face areas as 

Pi,j,k^j {'"^i,j,k - vg2j) , (B48) 



and 



.g316" .932a 



1+1/2 



dxla'l/dvlla'}. 



Advection of p, e, E, and X(l) along the j coordinate then proceeds as 

'pli,kdvl2a^ + - M2 Ai] /dvl2a 



n+l 
■3 ' 



(B49) 



(B50) 
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At } /dvl2a 



n+l 



At I /dvl2a]+\ 



and 



Similarly, the advection of scalar quantities along the k axis is done as follows: define 



and 
Thus 



n+l 
f^i,3,k 



A = g2b^ dxla^ dx2a]/ {dvlla^ dvl2a]) . 



^'l^!k = {^l3,kdvli^l + 



V3 



PJk 



,3,k 



PJk+1 



^Lk+1 



.1 



^Lk 



M. 



>3 



py k+1 



At I /dt;/3a^+\ 
Atj /dvl3al+\ 



(B51) 
(B52) 

(B53) 

(B54) 
(B55) 
(B56) 

(B57) 
(B58) 



and 



mli^Akdvl^^l + {x{l)uMl,^^ - X{l)k+iMlj,,+,) At] I (/l^ldvlZal^^) . (B59) 



B.4.2. Momentum Variables 

Each component of the specific momentum is computed (modulo metric factors) by dividing the appropriate velocity 
component by an arithmetic average of the density at the corresponding cell face. Thus 

Sli,j,k = 0.5 vUj,k {Plj,k + P7-i,j,k) . (B60) 

S2ij,k = 0.5 v2ij,k {Plj,k + Ph-i,k) g^bi, (B61) 

53i,,- fc = 0.5 v3ij,k {Plj,k + Plj,k-i) 531bi 5326,- (B62) 

Along the i coordinate, the specific momenta are transported according to the following: 

dvllb^ + Sli - Sli-i] ldvllh^+^; 
S2lj^k dvlla2 + S2i - S2i^ /dvlla^+'^; 

S3l. dvllal + S3^ - S3^^ ldvlla'l+^. 



1,3. k ■ 



(B63) 
(B64) 

(B65) 



Note that the volume factors used to transport S2ij_k and S3i_j^k differ from those used to transport Slij^k owing to 
the different centering of S2ij,k and S3i,j^k with respect to the staggered i mesh. The momentum fluxes are constructed 
from the previously computed i components of the mass flux as: 

(B66) 
(B67) 
(B68) 



= {mIj,, + M^i,,,.) ^1 (0.5 g2br'^' gilbr'^' 

52, = (m^,._i,, + Ml^^,) V2 (0.5 52a? ff2a^^/' gSla^^'^^) , 

53, = (m1,.,_i +M^^-fc) ^3 (0.5 531< 52ar+'/' 5310^'/') ■ 



In the definition of the momentum fluxes, vi, V2, and vs denote cell-centered Van Leer averages of the three relative 
velocity components: vlij^k — vgli, v2ij^k — vg2j, and vSij^k — vg3k- 
Along the j coordinate, momentum advection is computed via 



CI "+1 



S^j - Slj+i 



52" 



di;/26" + 52,_i - 52, 



/dvl2a]+'^; 
/dvl2b''+'^; 



53^^. dvl2a] + S3j - S3j+i /dvl2a]+\ 



(B69) 
(B70) 
(B71) 
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with 



(Mf_i^^. + M^^. 5i (0.5 gSla^ g32a 



n+1/2 



dxib7 dvim 



S2i 



(B72) 

(B73) 

(B74) 
(B75) 

As with the definitions of momentum fluxes along the i axis, the v terms in the j-flux expressions represent Van Leer 
averages of the relative velocity components, but the numerical values differ owing to the change of axis. 
Finally, the A;-axis equations for momentum advection are written as 



53,- = + m2 . ,) ^3 (0.5 {g^lh^f 532< 532<+^/' dxla^ dvllaf) . 



51^^- fc dvlMl + Slk - Slk+i /dvlSa^ 



n+l. 



821^ ,^ dvl3al 



S2k — S2k+i 



-.n+l. 



53^^- dvl2,hl + S3fe_i - 53^ 



/dvlial 
/dvmi+\ 



(B76) 
(B77) 
(B78) 



with 



Slk = 


{mI 


-i+Mf) 


vi (0.5 520^ dxlb"^ dx2a]/dvllbi dvl2aj) ; 


(B79) 


S2k = 


(mi 


-i+Mf] 


{is (0.5 {g2hV:f dxla'y dx2U] / dvllaJ} dvl2bf) ; 


(B80) 


S3k = 






1 (0.5 5316^ £f267 dxla^y g32b^ dx2a]/dvllaf dvl2a]) . 


(B81) 



C. THE 3-D DISCRETE MHD EQUATIONS 
C.l. Construction of the EMF's 

In a 3-D geometry expressed upon covariant mesh variables, the characteristic equations for Alfven wave propagation 
(43) along the 1-axis become 

^^(Wi,,-,fc) ± ^{vhj,k) = ±S, (CI) 
in which the Lagrangean derivative is expanded as 



d 



d 



d 



the Alfven velocities are given by 



.(2) 
,(3) 



= 62/V^, 

=b-i/^p, 



(C2) 

(C3) 
(C4) 



and iS is a source term arising from derivatives of the coordinate metric factors (= in Cartesian geometry). We 
difference the temporal derivatives along each characteristic as 



P+[61] = 61* -61+, 


(C5) 


r'-[6i] = 6i*-&i-, 


(C6) 


V+[vl]=vl* --vl+, 


(C7) 


V-[vl]=vl* -vl-, 


(C8) 




(C9) 



and solve the characteristic equations for vl* and 61*: 

61* = 

and 



61+ 61- 



+ vl+ - vl~ 



vl* 



1 



t;l+-v/p+ + vl~\/p- + 61+ - 61" 



+ S/^t. 



(CIO) 
(Cll) 



As discussed in the main text, /0+ and p are the densities at the footpoints of the respective characteristics. Equa- 
tion CI, in view of equations (C2)-(C4), suggests that when evaluating SStj^k as outlined in section 3.2, vl and 61 
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should be upwinded along both the 2- and 3-components of the characteristic velocity. The numerical impracticality 

of this approach leads us to adopt the approach of Hawlcy & Stone (1995), in which only partial characteristics arc 
used to upwind velocity and magnetic field components. Quantities upwinded along Alfven characteristics are then 
combined with quantities upwinded along hydrodynamic fluid-flow characteristics in a self-consistent fashion. We 
illustrate the procedure by outlining the calculation of e3, written schematically as 



e3 



vl*b2* - v2*bl* 



(C12) 



To aid the documentation, we introduce an "ADV" function in which ADV[62, vl] denotes a mean value for b2 
computed from a Van Leer average upwinded according to the vl velocity. This functional notation will be used to 
describe quantities upwinded along coordinate axes (fluid flow characteristics) or Alfven characteristics. Our method 
subdivides into two stages. In stage I, partial characteristics along the 2-axis are used in the construction of values for 
vl* and bl* as follows: 
Step la: upwind 62 and v2 along 1-axis: 



62*^^=ADV[62,wl]; 
i^^^^=ADV[z;2,z;l]. 

Step lb: compute 2-characteristic Alfven speeds: 

--v2'-'^ + \b2'''\/Vp- 



(2-) —(1) , iT:?r(i)i 



Step Ic: upwind vl and bl along the +/- characteristics: 





= ADV 






= ADV 


vl,v^'^-'> 




= ADV 






= ADV 





Step Id: solve the characteristic equations (CIO and Cll) for 61* and vV 



bl* ■ 



61 



(2+) 



61 



(2-) 



(C13) 
(C14) 

(C15) 
(C16) 

(C17) 
(C18) 
(C19) 
(C20) 

(C21) 



vl* = 



'P+ + \Jp- 



+ SAt 



^(1) 



+ 61 — 61 



(C22) 



Step le: compute and store the products vl* 62 and v2 bl*. 

Stage II is analogous to stage I, except that now we solve for v2* and 62* by examining partial characteristics in the 
1 direction: 
Step Ila: 



Step lib: 



Step lie: 



6l'^^=ADV[61,t;2]; 
tn:^^'=ADV[vl,u2]. 



„(i-)=iyT(2) + |6T^^Vv^- 





= ADV 


'v2,v^'+^' 




= ADV 


v2,v'-^-'^ 


62^^+) 


= ADV 


62,1.(1+)' 


62^^-) 


= ADV 


62,i;(i-)" 



(C23) 
(C24) 

(C25) 
(C26) 

(C27) 

(C28) 
(C29) 
(C30) 
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Step Ild: 



62* 



■t:7(i+) t:t(i-) 



v2* 



v2 



{1+) 



(1- 



(1+) 



62 



{!-) 



+ 5 At 



^2) (2) 

Step He: compute and store the products v2* 61 and vl 62*. 



With these two stages complete, we now write the 3-EMF as 



e3 



t;l*62^^^ +ul^^^62* 



t;2*61^^^ +t;2^^^61* 



2 2 
The 1-emf and 2-emf expressions are derived and expressed analogously as 



and 



el 



e2 = 



t;2*63^^^ +u2'^^63* 



w3*61^^^ +t;3^^^61* 



i;3*62^^^ +t;3^^^62* 



t;l*63^^^ +t;1^^^63* 



(C31) 



(C32) 



(C33) 



(C34) 



(C35) 



Because each component of the magnetic field (e.g. 61) depends upon EMF's computed around both transverse axes 
(e.g. e2 and e3), the evolution of each B-field component will depend upon the full set of characteristics. This method is 
effectively a simple directional splitting of the full MOC algorithm. As discussed in Hawley & Stone (1995), each term 
in a given EMF expression is composed of 1-D advection solutions in which hydrodynamic characteristics are mixed 
with Alfven characteristics in a consistent fashion; i.e. in the leading term of equation C33, 62 has been passively 
advected along the same coordinate axis for which the characteristic velocity equation is solved. This consistency 
is maintained in all terms of the EMF equations. Additional discussion in Hawley & Stone (1995) notes that the 
practice of consistently mixing partial Alfven characteristic solutions with hydrodynamic advection retains the relative 
simplicity of 1-D upwinding yet is less prone to error in the presence of strong magnetic discontinuities. 

C.2. Lorentz Acceleration of Velocities 

The Lorentz accelerations are computed by a procedure analogous to the calculation of the EMF's outlined above. 
In what follows we make extensive use of the notation introduced in the previous section. We demonstrate the method 
in detail by writing expressions for the 1-component of the Lorentz acceleration, which depends upon information 
propagating along Alfven characteristics in the 2- and 3-directions. Stage I of the solution considers the Alfven 
2-characteristics : 

Step la: define footpoint densities as 

a/2 



Step lb: define average of b2ij^k and compute Alfven speeds: 



iPi,j-l,k ■ Pi-l,j-l,k) 



62j=0.5 (62ij,fe + 62i_ij,fc) ; 



,,(2+) _ 



„(2-) . 



= -|62|/ 



Step Ic: upwind blij^k and vltj^k along the (+) and (-) Alfven characteristics: 



61(2+) =ADV 

6l(2-)^ADV 
1,1(2+) =ADV 
ul(2-)=ADV 



61, V 



(2+)' 



61,1; 



(2- 



vl, V 



(2+) 



vl,v 



(2-) 



Step Id: solve characteristic equation for 6: 



61^ 



61 



<2+) 



+ 



61 



(2-) 



+ SGN 



[1,62]. (■ 



vl 



(2+) 




(C36) 

(C37) 
(C38) 
(C39) 

(C40) 
(C41) 
(C42) 
(C43) 

(C44) 
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where "SGN [l,62]" is plus or minus 1 depending on the sign of 62. Finally, 
Step le: evaluate the first contribution to the Lorentz 1-acceleration: 

51^,, = (62,+i +b2j) (611;, - blf) I {g2ai dx2aj) . 

Stage II examines evolution along the Alfven 3-characteristics as follows: 
Step Ila: define footpoint densities as 

a/2 . 



V p+ = {pi,j^k • Pi,j,k-l) ; VP = {Pi,j,k-1 ■ Pi-l,j,k-l) 

Step lib; define average of b^ij^k and compute Alfven speeds: 

53fe =0.5 (63ij,fc + 62i_ij,fe) ; 



1/2 



,,(3+) . 
„(3-) . 



Step lie: upwind blij^k and vlij^k along the (3+) and (3-) Alfven characteristics: 



61(3+) =ADV 
6l(3-)=ADV 

t;l(3+)=ADV 

vl(3-)=ADV 



61, t; 



(3+) 



bl,v 



(3- 



' '^A 



1 (3-) 



Step Ild: solve characteristic equation for 6: 



6l3* = 



-61^^+^ 61^^-^ 



+ 



+ SGN 



[1,63] • (■ 



■— j-(3+) ^(3- 

vl — vl 




Step He: add the second contribution to the Lorentz 1-acceleration to the first: 

Slfj^^ = (62,+i +62,) (6l|;i - 6lf ) / (52a, dx2aj) 

+ (63fc+i + b3k) (6l|;i - 61|*) / {gSlai g32bj dxSak) . 
The 2- and 3-components of are similarly written as 

52^^._fe= (63fc+i + 63fc) (623;, - 62f ) / {gSlai g32bj dxSak) 

+ (6li+i + bli) (62i;i - 62^*) / {g2bi dxlai) ; 
S3^,. fe= (6Ti+i + 6Ti) (63i;i - 63^*) / (5316, dxla^) 

+ (62,+i +62,) (632;, - 63f ) / {g2ai g32bj dx3ak) . 
With the accelerations thus defined, the fluid velocities are accelerated according to 



'~ k I V Pi,i,fe • Pi,j-l,k', 



Kk-l, 



(C45) 

(C46) 

(C47) 
(C48) 
(C49) 

(C50) 
(C51) 
(C52) 
(C53) 

(C54) 

(C55) 

(C56) 
(C57) 

(C58) 

(C59) 
(C60) 



where the n + b superscript denotes velocities which have been updated in the source step via local body forces (step 
"a") and artificial viscosity (step "b"). 

C.3. Evolution of the Field Components 

With EMF's suitably computed on each edge of the 3-D grid cell, the three magnetic field components are evolved 
from time level n to level n + I via the 3-D CT formalism. As with the gas hydrodynamic advection equations, 
the HSMOCCT algorithm is formulated to account for grid motion along all three coordinate axes, thus special care 
must be taken to time-center spatial coordinate terms correctly. The line integral equations describing the temporal 
evolution of the magnetic fluxes through the i, j, and k cell faces were introduced in §3.2; for ease of reference we 
repeat them here: 

<P^lt,l-^^h,k _ 
At 



= e2ij^kAx2ij + e3j,j+i,feAa;3i,j+i,fe — e2ij^k+iAx2ij — eSij^k^xSij^k', 



(C61) 
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' ' — = elij,fe+iAxlj + e3ij+i,fcAx3jj,fc - eljj,feAxli - e3j+ij,feAa;3j+ij,fe; (C62) 

— ^ = elij\feAxli + e2j+ij,fcAx2i+ij - el^ j+i^fcAxlj - e2ij_feAx2ij. (C63) 

The magnetic fluxes are related to the field components and covariant metric tensor coefficients through the following 
relations: 

(pl = h2 /i3 dx2 dx3 bl g2ai g'ilai g32bj dx2aj dx3ak blij^k] (C64) 
(j}2 = hi hi dxl dx3 62 — > g31bi g32aj dxlai dxSa^ 62jj^fc; (C65) 
(j}3 = hi /i2 dxl dx2 63 — > g2bi dxlai dx2aj b3ij^k- (C66) 

Cell edge line elements are transformed to covariant coordinates via 

Axl^dxla,; (C67) 
Ax2-^g2a, dx2af, (C68) 
Ax3— ^gSlOi g32aj dx3ak- (C69) 

The evolution equations for blij^k, b2ij^k, and 63^^^^ are then written as: 

j-^fc)""''"^ 61"^^ = (Alij^k)" bVlj i. + At[£2ij^k + S3ij+i^k — S2ij^k+i — S3ij^k] ', (C70) 

{■A.2ij^k)"'^^ = {A2ij^k)" ^'^i,j,k + At [£3ij,k + £U,j,k+l — £^i+l,j,k — £U,j,k] ; (C71) 

{■^^i,j,kT^^ b3'^j l = {A3ij^kTb3i j k + A.t[£lij^k+£^i+l,j,k- £'i-i,j+l,k- £^i,j,k]- (C72) 

Equations (C70) - (C72) make use of area factors, Alij^k, A2ij^k, and A3ij^k, which are simply the metric coefficients 
multiplying the corresponding b component in equations (C64) - (C66), evaluated at time level n or n + 1 according to 
the associated superscript. In ZEUS-MP, the "emf[l,2,3](i,j,k)" arrays store the £ values indicated in (C70) - (C72), 
and are given by the true EMF components multiplied by the appropriate time-centered line element: 

£U,j,k = eli j,fe (da;lai)"+'/' ; (C73) 
£2ij^k = e2ij, fe {g2ai dx2ajf+^'^ ; (C74) 
£\3,k = e3i j,fe {g31ai g32aj da;3afe)"+^/^ . (C75) 

D. THE 3-D DISCRETE RADIATION DIFFUSION MATRIX 

D.l. Flux Limiters 
We write the three components of radiation flux as 

F1 = -D1(VE)\ (Dl) 
F2 = -i:>2(VE)^ (D2) 
F3 = -L'3(VE)-\ (D3) 

where the quantities (-D1, -D2, D3) represent flux-limited diffusion coefficients computed independently along each axis, 
and the superscripts on VE indicate the appropriate component of the gradient operator. Recall from equation 11 
that each diffusion coefficient takes the following form: 

ZEUS-MP currently implements two forms of the flux-limiter, Ae- The first is due to Levermore & Pomraning (1981), 
(c.f. equation 28 of their paper): 

where R is given by 

ii=||VE||/E. (D6) 
The second option is a construction derived by Minerbo (1978): 



2 



R< 1.5; 



AE(Mi) = (D7) 



36 

where R is as defined previously. An important feature of the implementation is that the numerical value of R is lagged 
in time because it is evaluated with converged values of E from the previous time step: 

IIVE^.. II 

RiJ,k = (D8) 

This choice preserves the linearity of our discrete solution for 

Because V-F must be defined at cell centers for consistency with E in (4), the flux components are considered to 
be centered on cell faces. This introduces an additional subtlety in the computation of diffusion coefficients, as the 
opacities (%) and R values (and hence VE") must be colocated with F. Thus, while R is manifestly a scalar quantity, the 
face-centered opacity must be computed from an average of neighboring cell-centered values whose spatial relationship 
depends upon the cell face in question. Face-centered gradients in E are subject to a similar constraint. At a given cell, 
each component of flux acquires a (generally) unique value of the E-dependent flux-limiter, which further underscores 
the simplification gained by time- lagging the evaluation of i? as a function of E. 

D.2. The Matrix 

Recall from §3.3 that the discrete radiation and gas energy equations solved in the ZEUS source step are written as 



f(l) _-pn+l -pn A J. 

f(2) _„n + l _ p" — 



4^.cpBp - cn^El+l - V-Fl+l - Vv : P^+i 
-47rKpBp + ckeE'I+I - pV-v 



(D9) 
(DIO) 



Our derivation of the FLD matrix proceeds by first differentiating equations (D9) and (DIO) with respect to e^.j^fe and 
Eij.fe. Considering first the radiation energy equation, we note that fl^jj^ depends on the value of Ci,j^k through the 

evaluation of Bp, which requires an (in general) energy-dependent material temperature. The dependence of /f ^''j, on 
E is more complex, owing to the flux-divergence term. As will be documented below, V-F^j^fc is written as a 7-point 
function in E coupling Eij^k to nearest-neighbor values along all 3 coordinate axes. Evaluating the Jacobian for the 
radiation energy equation will yield a system of the following form: 

■^i,j,kSEi,j,k-l + ^i,j,k5^i,j-l,k + Cij^k5Ei-l,j,k + T^i,j,kSi^i,j,k + ^T^^^\ 
^i,j,kSEi+l,j,k + ^i,j,kSEi,j+l,k + Gi,j,kSEi,j,k+l + 'HiJ.k^^iJ.k = —fi^j^k^ 

where Aij^k, through Hij^k are given by: 



A , — ■'i.j,'' ■ R. . , — ■''■J. 



k-i 



^i,j,k - gE"+;*J ■^''■'■'^ dEl 



i+i ) 



.i.j." (D12) 



'-'l,J,k — opn+l ) •'i,],k 



n. . , ■'i.j.k . -ly. . _ "Jj.j .k 



i,J,k 



Because the gas energy equation involves no space derivatives in the solution variables, the Jacobian expression is 
considerably simpler: 

■j^Seij^k + j^^SEij^k = -fllk- (D13) 

The fact that /jj^fe depends on the gas energy only through e^jj^ allows Seij^k to be written algebraically as 

fS,k+{9fS,k/9KU)s^i,J,k 
d&/d^l^'k 



^^iik = 777^ • (D14) 



Substitution of (D14) into (Dll) eliminates the explicit dependence of the radiation energy Jacobian on Ssij^k, resulting 
in a reduced linear system for the radiation energy corrections: 

Ai,j,k ^^i,j,k — l ~l~ 

iSiJ^k S^i,j-l,k + 

Ci,j,k SEi-ij^k + T^'ijk + ^i,3,k <^Ei+i -|- 

Tllk5Ei^j+i,k + ^^^^> 

Gi,j,k ^^i,j,k+l 



37 



where 



V': 



qAI) 
Ji,3,k 



(2) 



(D16) 



(D17) 



The H coefficient of Seij^k has been absorbed into V; coefficients B and T> through H remain unchanged. The 
terms on the LHS of (D15) have been been arranged along multiple lines in a manner illustrating the band structure 
of the resulting matrix, which is described by a tridiagonal structure coupling points (i-l,j,k), (i,j,k), and (i+l,j,k), 
accompanied by subdiagonals coupling points (i,j-l,k) and (i,j,k-l) and superdiagonals coupling points (i,j+l,k) and 
(i,j,k+l). 

Equation (D15) is equivalent to a matrix equation of the form A^^E = TZ, where is a 7-banded matrix 
whose diagonals are specified by the values of A through Q. As with the covariant form of the Poisson equation 
matrix (appendix E), Ai may be symmetrized by multiplying each row by a total volume element for zone (i,j,k): 
AVlaiAV2ajAV3ak- Written in this way, it is necessary to evaluate (and document) only the five bands V through 

G, and the RHS vector, TZ. 
The main diagonal of the symmetrized matrix is given by 



X AVlaiAV2ajAV3ak. 



(D18) 



We evaluate the four required function derivatives as a function of a time-centering parameter, 9, such that 6 = 1 
gives fully implicit time differencing. (The time step. At, is by definition time centered.) We present the derivatives 
in order of increasing complexity, thus: 

^^^'^ = -eAtcK^l+l; (D19) 



-9At 



n+e 



K-p 



i,j,k 



(D20) 



1;J,k 



l + OAt 



+ 6 At 



n+e I ^-^F \ I ^ n+e 



( 



n+e 



,3,k 

n+e 

i,j,k 



dB 
de 



n+e 



i,j,k 



The final derivative expression is written schematically as 



= I + eAt 



d 



^<J,l + ^^(V-Fi,,,.) + ^^(Vv:P),,,, 



i,j,k 



BE] 



(D21) 



(D22) 



Because we assimic that P = fE, the final term in (D22) is simply Vv : f , where f is assumed known and held fixed 
during the N-R iteration. To evaluate V-F, we assume that the three components of F are given by (Dl) - (D3), and 
we express the divergence operator in covariant coordinates using equation (116) of Stone & Norman (1992a): 



V-F = 



1 



9(ff26'3iFl) 



o o 

— {h2h:,Fl) + —{h,h3F2) + _(/ii/i2F3) 

0x1 0x2 0x3 



1 a(.932F2) 



1 a(F3) 



(D23) 



dVi 52 dV2 931932 dVs 

where in the latter expression we have transformed spatial derivatives into volume derivatives. A similar operation is 
performed on each component of the gradient operator: 

1 (9E 1 9E 1 aE 

hi dxi ' ft,2 ' dx^ 



VE = 



/ 5E dE 1 dE\ 



(D24) 
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With (D23) and (D24) in liand, we may construct a discrete form of V-F explicitly in terms of the 7-point stencil in 
fe, from which derivatives of V-Fij^k with respect to the appropriate E variables may be read by inspection. We 
provide the result here: 



2 / 



g2bfAV2aj \ AV2bj+i 



{g'32aj)^ D2ij^k f^i,j,k — ^i,j-i,k 



g2blAV2aj \ AV2hj 



{g3lbig32hj) AV2,ak \ ^Vibk+i 



{gSlbig^jf AV3ak V AV3bk 
The middle term in (D22) then follows at once: 

dV-Fjj^k _ {g2ai+ig31ai+if Dl,+ij^k (g2a^g31a,)^ DU,j,k 
dEij^k ~ AVlaiAVlbi+i AVlaiAVlbi 

{g32aj+if D2ijjf.i^k , ig32ajf D2i^j^k 



(D25) 



g2b'^AV2ajAV2bj+i g2b'^AV2ajAV2b 



{gm^g32bjf AV3akAVSbk+i igUb,g'32bjf AVSakAVSbk 

The three super-diagonal bands of the symmetric matrix, Sij^k^ ^i,j,kj and ^i,j,fe, originate in the derivatives of 
V-Fjj^fc with respect to Ej+ij^fe, Ejj+i^fc, and Eij^k+i, respectively. We therefore have: 



= J^'"'' X AVlaiAV2ajAV3ak 

Otii^lj^k 



{g2a,+,gna,+,f DU+,j,k ^ ^v2ajAV3ak; (D27) 



AVlh+i 



(1) 

^!'i"^ = ^^^x AVlaiAV2ajAV3ak 

Oi^i,j+l,k 



= -^^^ ^-'^'Sat/o.'"'^''' X ^Vla^^V'Sak-, (D28) 
g2bfAV2bj+i 

gAl) 

QlTk = J'''^' X AVlai AV2ajAV3ak 

Otjij^k+l 

= -9 At D3^,3,k+i ^ AVlaiAV2ai. (D29) 

{g31big32bjf AV3bk+i 
Finally, the RHS of the symmetrized linear system is evaluated as 

^ I \d0/d^') ~ ^''''"j ^^l«^^^2a,Ay3afe, (D30) 

with df^^ljdel+l and df^^^Jdel+l given by equations (D20) and (D21), respectively. 

E. THE 3-D DISCRETE POISSON EQUATION MATRIX 

The 2-D form of Poisson's equation was written (although not formally derived) in Stone & Norman (1992a); here 

we extend the discrete expression to 3-D and explicitly derive and document the matrix elements. Following Stone & 
Norman (1992a), wc write the general tensor form of the Laplacian operating on a scalar function, as 



/ll/l2/l3 



dx\ \ hi dxi J dx2 \ h2 8x2 J 8x3 \ hs 8x3 



(El) 
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The inner partial derivatives of $ are rewritten as functions of ZEUS metric coefficients: 

9$ 531532 9$ _ 52 a$ 



91 



52531532^—; 92 = — — 13 = — — ; (E2) 

dXi 52 0X2 531532 0x3 

and the outer derivatives over the "q" functions so defined are transformed into volume derivatives and written in 
discrete form as 

1 dqi ^ 1 / glj+i - q 
532 dVi g32bj \ AVla^ 
1 dc/2 1 / q'2j+i - q'2j 



52531 dV2 

1 5^3 



g2big51bi 
1 



/ 93/s+i — q3k 



(E3) 
(E4) 
(E5) 



52531532 dVs g2big31big32bj \ AV3ak 
The derivatives inside of the q functions are left as discrete coordinate differences: 

qU = g2aig31aig32bj {^i^j^k - ^i-i,],k) / Axlb^, (E6) 
q2j = {g31big32a,/g2b,) ^ - $i,j-i,fe) / Ax2bj, (E7) 
q3k = {g2bi/g31big32bj) {^ij,k - ^i,j,k-i) / Ax3bk. (E8) 

Leaving the inner derivatives as functions of coordinate (liffcrcnccs was done for consistency with the formulation in 
the public ZEUS-2D code. We have also formulated the linear system for the case in which the inner derivatives 
are also transformed into volume differences. We have not discovered an application in which this distinction has a 
measurable effect. Wc therefore adopt the former approach for the purposes of this document. Evaluating (E3) - (E5) 
with (E6) - (E8) yields 
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1 dq2 
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{V2^ij+i^k ■+ 
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g3lb,g32bjj \AV3ak , 
{T'S^.j-fc+i + I?3$ij,fe + M3$ij,fc+i}. 
In (E9) - (Ell), the V, V, and M. functions are written as 
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The LHS of the discrete Poisson equation may be constructed by a direct summation of expressions (E9) - (Ell). 
Such action results in a 7-bandcd sparse matrix (cf. equation 58) in which elements along the main diagonal arc given 
by the sum of the 3 T) expressions listed above, multiplied by the inverse volume factor (l/Ayia)^. Similarly, the 
first superdiagonal band (coupling $i,j,fe to ^i+i,j^k) is given by the VI expression multiplied by the associated volume 
factor in (ElO). The remaining two superdiagonals and the three subdiagonal bands are derived in analogous fashion. 
The matrix may be symmetrized, however, if expressions (E9) - (Ell) are first multiplied by a total volume element 
Ay = Ayia^ AV2ajAy3a/j. The resulting transpose symmetry allows explicit calculation, storage, and operation 
upon the three subdiagonals to be avoided. The symmetric linear system may be written symbolically (compare with 
equation 58) as 

S4^*i+l,j,fe + S5^j,j+l,fc + S6^i,j,fe+1 + 

ST^ij^k = 47rG (Ay Itti Al/2aj AySflfc) j,fc, (E21) 



with 



54 = Ay2a,AF3afe x VI + , ' ' x V2 + 

Ayia,AV2a, ^„ 

55 = AV2ajAV3ak x VI, (E23) 



Ayia,Ay2a, ^„ 

--= (,316W ^'^'- ^^^'^ 
F. lA/IPLEMENTATION TECHNIQUES AND STRATEGIES 

The ZEUS algorithm solves the partial differential equations describing astrophysical fluid flows by means of an 
"operator-split" finite difference scheme. The field variables are advanced in time through a series of substeps corre- 
sponding to each operator (physical process) contributing to the full evolution equations. Whether the field variables 
are updated in an explicit or implicit manner, they use values of quantities computed during the previous substep. 
Therefore, a parallel algorithm in which multiple substeps are executed concurrently is not feasible. Instead, our 
parallelization strategy is based on domain decomposition, in which the spatial mesh is divided into "tiles" and the 
field variables are updated in each tile concurrently. Each substep in the time-stepping scheme is completed in all tiles 
before moving on to the next substep, so that the time levels of all variables remain synchronized between tiles. 

Gradients and other spatial derivatives appearing in the evolution equations are approximated by linear combinations 
of field variable values evaluated at discrete points in a set of several neighboring mesh zones comprising the "stencils" 
of the difference operators. Evaluating spatial derivatives in mesh zones near tile boundaries requires values of some 
quantities at locations in zones belonging to neighboring tiles. Therefore, before we can update the field variables 
in zones near the boundaries of a tile, we must receive some data from neighboring tiles as required by the stencils. 
We perform the required exchange of data between tiles by means of "message passing" , using the MPI library. MPI 
enables the code to execute efficiently on many types of parallel architectures, from systems with globally shared 
memory to clusters of workstations. 

Optimal parallcd efficiency is achieved by minimizing the ratio of communication overhead to computational work 
(updating field variables). The amount of data that needs to be exchanged between tiles is proportional to the number 
of zones near tile boundaries (not physical boundaries, unless periodic boundary conditions are prescribed there). We 
therefore minimize the ratio of the number of zones near tile surfaces to zones in tile interiors by decomposing the 
domain along each active spatial dimension. We balance the load by assigning nearly the same number of zones to 
each tile. 

Communication overhead involves more than merely the transit time for the messages (which is proportional to 
message size, i.e., the number of array elements). It also includes network latency (same for any message size), time for 
the CPUs to copy/pack/unpack the data to be passed, and context switching delays as the CPUs alternate between 
updating variables and passing messages. Fortunately much of the communication overhead is comprised of idle cycles, 
some of which can be spent doing other useful work, provided one makes use of the "non-blocking" communications 
operations in MPI. 

One way to reduce communication overhead is to minimize the number of mesages that are sent. Of particular 
concern is the exchange of data between tiles that share only one corner point. Only a few zones near a corner require 
any data from tiles sharing only that corner, but each tile has 8 corners, each of which are shared by 7 neighboring 
tiles. Each tile also has 8 edges which are shared by 3 neighboring tiles. In contrast, each of the 6 tile faces has at 
most just 1 neighboring face. We can avoid passing a large number of small messages by exchanging messages across 
tile faces in 3 stages, sending and receiving messages along just one dimension per stage (see Figure Fl). We begin the 
next communication stage only after the previous stage is completed. Data from neighboring tiles is stored in the 2 
layers of "ghost" zones on the surfaces of each tile. This ghost cell data is included in all messages and automatically 
carries edge and corner cell values to the tiles that share only those edges and corners. 
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Fig. Fl. — MPI communication flow in ZEUS-MP. 

In some of the substeps in the ZEUS algorithm, such as advection along the "i" direction for pure hydrodynamics, 
updating the field variables involves relatively little computational work. In such cases, we employ a more agressive 
strategy to overlap more communication operations with computations (at the expense of a more complicated code). 
We subdivide the zones in each tile into 3 roughly equal groups, so that one third of the interior zones can be updated 
while the messages for each communication stage are in transit. After the messages for a given stage are received, we 
update field variables in zones near tile boundaries for which all data is available. The precise procedure is as follows: 

1. Boundary data is exchanged with neighbors along the i-axis while updates are performed for the 1st third of 

interior zones. 

2. Boundary data is exchanged along the j-axis while updates are performed (a) for the values of i skipped in step 

(1) in the 1st third of interior zones, and (b) for the 2nd third of interior zones (all i). 

3. Boundary data is exchanged along the k-axis while updates are performed (a) for the values of j skipped in step 

(2) in the first two thirds of interior zones, and (b) for the 3rd third of interior zones (all i and j). 

4. Updates are performed for the values of k skipped in the previous steps (all i and j). 

The procedure outlined above is adopted for several stages in the source step portion of the hydrodynamics update, 
including the artificial viscosity, body forces, compressional heating (pdy), and EOS updates. Because the radiation 
module employs an implicit solution which updates E^^j^fc and Qi^j^k at all mesh points simultaneously, partial mesh 
updates are not possible. A related procedure is employed, however, in both the subroutines which compute matrix 
elements and within the CG linear solver routine which returns corrections for '&ij,k during a Newton- Raphson itera- 
tion. By construction, the FLD stencil never accesses data which lies outside of a tile boundary along more than one 
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axis; i.e. "ghost corner" zones are never accessed. Because of this, the asynchronous MPI exchanges of tile faces may 
be initiated for all three axes simultaneously, since only ghost corner cells depend upon the ordering of face updates. 
Within the FLD module, therefore, we exploit a simplified procedure in which updates are performed along tile faces, 
MPI "ISEND" and "IRECV" calls are posted for face data along all axes, interior updates are performed, and an 
MPI "WAIT" operation is performed to ensure that message-passing has completed before proceeding. This allows 
considerable opportunity for overlapping communication with computation both in the evaluation of matrix elements 
and the processing of matrix data dming the CG linear solution step. 

At the very end of a time step, the size of the next explicit time step is computed from the minimum value of 
the Courant limit over all zones. MPI and other message-passing libraries provide routines to perform such global 
reduction operations efficiently in parallel. The time spent waiting for this operation to complete comprises only a 
fraction of the total communication time. 
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